You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@impala.apache.org by mj...@apache.org on 2017/02/17 23:17:04 UTC

[1/9] incubator-impala git commit: IMPALA-4937: Remove unused kudu scanner keep alive variable

Repository: incubator-impala
Updated Branches:
  refs/heads/master 26eaa2660 -> 815c76f9c


IMPALA-4937: Remove unused kudu scanner keep alive variable

gflag kudu_scanner_keep_alive_period_us is defined but not
used. The same functionality is provided by
kudu_scanner_keep_alive_period_sec (in seconds), which works.
The kudu_scanner_keep_alive_period_us flag is removed by
this patch.

Change-Id: I99f56f93f1ff8543cbe415542464a053b24968e2
Reviewed-on: http://gerrit.cloudera.org:8080/6021
Reviewed-by: Sailesh Mukil <sa...@cloudera.com>
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/aee5457a
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/aee5457a
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/aee5457a

Branch: refs/heads/master
Commit: aee5457a55d78203576b85440bc94fd3f6469fe0
Parents: 26eaa26
Author: Matthew Jacobs <mj...@cloudera.com>
Authored: Wed Feb 15 11:01:29 2017 -0800
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Thu Feb 16 22:25:13 2017 +0000

----------------------------------------------------------------------
 be/src/exec/kudu-scan-node.cc | 3 ---
 1 file changed, 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/aee5457a/be/src/exec/kudu-scan-node.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/kudu-scan-node.cc b/be/src/exec/kudu-scan-node.cc
index 630f5b7..a238e3c 100644
--- a/be/src/exec/kudu-scan-node.cc
+++ b/be/src/exec/kudu-scan-node.cc
@@ -44,9 +44,6 @@
 
 DEFINE_int32(kudu_max_row_batches, 0, "The maximum size of the row batch queue, "
     " for Kudu scanners.");
-DEFINE_int32(kudu_scanner_keep_alive_period_us, 15 * 1000L * 1000L,
-    "The period at which Kudu Scanners should send keep-alive requests to the tablet "
-    "server to ensure that scanners do not time out.");
 
 using boost::algorithm::to_lower_copy;
 using kudu::client::KuduClient;

[5/9] incubator-impala git commit: [DOCS] Major update to Impala + Kudu page

Posted by mj...@apache.org.

[DOCS] Major update to Impala + Kudu page

Upgrade with details of latest syntax.

Fine-tune discussion of PK and other Kudu
notions.

The impala_kudu diff looks larger than actual changes
to the page, because subtopics got moved
around and promoted/demoted (which changes the
indentation). Best to review that page start-to-finish.

CREATE TABLE details for Impala + Kudu.

ALTER TABLE details for Impala + Kudu.

Unhide the Impala partitioning + Kudu topic.
Mainly a brief intro then a link to delegate
details to the main Kudu page, which already
has a partitioning subtopic.

Include changes to reserved words. Entirely
from Kudu integration work.

Add Kudu considerations for misc SQL statements.

Addressed Todd's and Dimitris's comments for certain files.
(Up to the beginning of the "Partitioning" section in
impala_kudu.xml.)

Added Kudu blurbs to data type topics:
- Some aren't supported.
- Others are supported but can't go in the primary key.

Added walkthrough of renaming internal/external tables.

Split out Kudu CREATE TABLE syntax from other file formats.

Correct info about CTAS for Kudu tables.

Add examples of basic Kudu, external Kudu, and Kudu CTAS.

Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Reviewed-on: http://gerrit.cloudera.org:8080/5649
Reviewed-by: Matthew Jacobs <mj...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/661921b2
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/661921b2
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/661921b2

Branch: refs/heads/master
Commit: 661921b205caf5b894f0f8803418c302e7a55293
Parents: aee5457
Author: John Russell <jr...@cloudera.com>
Authored: Mon Jan 9 14:17:23 2017 -0800
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Fri Feb 17 01:10:12 2017 +0000

----------------------------------------------------------------------
 docs/impala_keydefs.ditamap                |    2 +
 docs/shared/impala_common.xml              |   52 +
 docs/topics/impala_alter_table.xml         |   80 +-
 docs/topics/impala_array.xml               |    3 +
 docs/topics/impala_boolean.xml             |    3 +
 docs/topics/impala_char.xml                |    3 +
 docs/topics/impala_compute_stats.xml       |   14 +-
 docs/topics/impala_create_table.xml        |  924 +++++++++++-----
 docs/topics/impala_decimal.xml             |    3 +
 docs/topics/impala_describe.xml            |   85 ++
 docs/topics/impala_double.xml              |    3 +
 docs/topics/impala_drop_table.xml          |    9 +
 docs/topics/impala_explain.xml             |   36 +
 docs/topics/impala_float.xml               |    3 +
 docs/topics/impala_grant.xml               |    3 +
 docs/topics/impala_invalidate_metadata.xml |    5 +
 docs/topics/impala_kudu.xml                | 1331 +++++++++++++++++++++--
 docs/topics/impala_literals.xml            |   18 +
 docs/topics/impala_map.xml                 |    3 +
 docs/topics/impala_partitioning.xml        |    8 +-
 docs/topics/impala_refresh.xml             |    5 +
 docs/topics/impala_reserved_words.xml      |   16 +-
 docs/topics/impala_revoke.xml              |    3 +
 docs/topics/impala_show.xml                |  173 ++-
 docs/topics/impala_struct.xml              |    3 +
 docs/topics/impala_tables.xml              |  145 ++-
 docs/topics/impala_timestamp.xml           |    3 +
 docs/topics/impala_truncate_table.xml      |    3 +
 docs/topics/impala_varchar.xml             |    3 +
 29 files changed, 2590 insertions(+), 352 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/impala_keydefs.ditamap
----------------------------------------------------------------------
diff --git a/docs/impala_keydefs.ditamap b/docs/impala_keydefs.ditamap
index 4fe8813..2562df9 100644
--- a/docs/impala_keydefs.ditamap
+++ b/docs/impala_keydefs.ditamap
@@ -10285,6 +10285,7 @@ https://issues.cloudera.org/secure/IssueNavigator.jspa?reset=true&amp;jqlQuery=p
   <keydef keys="impala25"><topicmeta><keywords><keyword>Impala 2.5</keyword></keywords></topicmeta></keydef>
   <keydef keys="impala24"><topicmeta><keywords><keyword>Impala 2.4</keyword></keywords></topicmeta></keydef>
   <keydef keys="impala23"><topicmeta><keywords><keyword>Impala 2.3</keyword></keywords></topicmeta></keydef>
+  <keydef keys="impala223"><topicmeta><keywords><keyword>Impala 2.2.3</keyword></keywords></topicmeta></keydef>
   <keydef keys="impala22"><topicmeta><keywords><keyword>Impala 2.2</keyword></keywords></topicmeta></keydef>
   <keydef keys="impala21"><topicmeta><keywords><keyword>Impala 2.1</keyword></keywords></topicmeta></keydef>
   <keydef keys="impala20"><topicmeta><keywords><keyword>Impala 2.0</keyword></keywords></topicmeta></keydef>
@@ -10298,6 +10299,7 @@ https://issues.cloudera.org/secure/IssueNavigator.jspa?reset=true&amp;jqlQuery=p
   <keydef keys="impala25_full"><topicmeta><keywords><keyword>Impala 2.5</keyword></keywords></topicmeta></keydef>
   <keydef keys="impala24_full"><topicmeta><keywords><keyword>Impala 2.4</keyword></keywords></topicmeta></keydef>
   <keydef keys="impala23_full"><topicmeta><keywords><keyword>Impala 2.3</keyword></keywords></topicmeta></keydef>
+  <keydef keys="impala223_full"><topicmeta><keywords><keyword>Impala 2.2.3</keyword></keywords></topicmeta></keydef>
   <keydef keys="impala22_full"><topicmeta><keywords><keyword>Impala 2.2</keyword></keywords></topicmeta></keydef>
   <keydef keys="impala21_full"><topicmeta><keywords><keyword>Impala 2.1</keyword></keywords></topicmeta></keydef>
   <keydef keys="impala20_full"><topicmeta><keywords><keyword>Impala 2.0</keyword></keywords></topicmeta></keydef>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/shared/impala_common.xml
----------------------------------------------------------------------
diff --git a/docs/shared/impala_common.xml b/docs/shared/impala_common.xml
index 1b8c171..4a9aa32 100644
--- a/docs/shared/impala_common.xml
+++ b/docs/shared/impala_common.xml
@@ -3730,6 +3730,58 @@ sudo pip-python install ssl</codeblock>
         NULL</codeph> attribute to that column.
       </p>
 
+      <p id="kudu_metadata_intro" rev="kudu">
+        Much of the metadata for Kudu tables is handled by the underlying
+        storage layer. Kudu tables have less reliance on the metastore
+        database, and require less metadata caching on the Impala side.
+        For example, information about partitions in Kudu tables is managed
+        by Kudu, and Impala does not cache any block locality metadata
+        for Kudu tables.
+      </p>
+
+      <p id="kudu_metadata_details" rev="kudu">
+        The <codeph>REFRESH</codeph> and <codeph>INVALIDATE METADATA</codeph>
+        statements are needed less frequently for Kudu tables than for
+        HDFS-backed tables. Neither statement is needed when data is
+        added to, removed, or updated in a Kudu table, even if the changes
+        are made directly to Kudu through a client program using the Kudu API.
+        Run <codeph>REFRESH <varname>table_name</varname></codeph> or
+        <codeph>INVALIDATE METADATA <varname>table_name</varname></codeph>
+        for a Kudu table only after making a change to the Kudu table schema,
+        such as adding or dropping a column, by a mechanism other than
+        Impala.
+      </p>
+ 
+      <p id="kudu_internal_external_tables">
+        The distinction between internal and external tables has some special
+        details for Kudu tables. Tables created entirely through Impala are
+        internal tables. The table name as represented within Kudu includes
+        notation such as an <codeph>impala::</codeph> prefix and the Impala
+        database name. External Kudu tables are those created by a non-Impala
+        mechanism, such as a user application calling the Kudu APIs. For
+        these tables, the <codeph>CREATE EXTERNAL TABLE</codeph> syntax lets
+        you establish a mapping from Impala to the existing Kudu table:
+<codeblock>
+CREATE EXTERNAL TABLE impala_name STORED AS KUDU
+  TBLPROPERTIES('kudu.table_name' = 'original_kudu_name');
+</codeblock>
+        External Kudu tables differ in one important way from other external
+        tables: adding or dropping a column or range partition changes the
+        data in the underlying Kudu table, in contrast to an HDFS-backed
+        external table where existing data files are left untouched.
+      </p>
+
+      <p id="kudu_sentry_limitations" rev="IMPALA-4000">
+        Access to Kudu tables must be granted to and revoked from roles as usual.
+        Only users with <codeph>ALL</codeph> privileges on <codeph>SERVER</codeph> can create external Kudu tables.
+        Currently, access to a Kudu table is <q>all or nothing</q>:
+        enforced at the table level rather than the column level, and applying to all
+        SQL operations rather than individual statements such as <codeph>INSERT</codeph>.
+        Because non-SQL APIs can access Kudu data without going through Sentry
+        authorization, currently the Sentry support is considered preliminary
+        and subject to change.
+      </p>
+
     </section>
 
   </conbody>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_alter_table.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_alter_table.xml b/docs/topics/impala_alter_table.xml
index c4df150..a3f1e19 100644
--- a/docs/topics/impala_alter_table.xml
+++ b/docs/topics/impala_alter_table.xml
@@ -34,6 +34,7 @@ under the License.
       <data name="Category" value="S3"/>
       <data name="Category" value="Developers"/>
       <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Kudu"/>
     </metadata>
   </prolog>
 
@@ -63,9 +64,11 @@ ALTER TABLE <varname>name</varname> REPLACE COLUMNS (<varname>col_spec</varname>
 ALTER TABLE <varname>name</varname> ADD [IF NOT EXISTS] PARTITION (<varname>partition_spec</varname>)
   <ph rev="IMPALA-4390">[<varname>location_spec</varname>]</ph>
   <ph rev="IMPALA-4390">[<varname>cache_spec</varname>]</ph>
+<ph rev="kudu">ALTER TABLE <varname>name</varname> ADD [IF NOT EXISTS] RANGE PARTITION (<varname>kudu_partition_spec</varname>)</ph>
 
 ALTER TABLE <varname>name</varname> DROP [IF EXISTS] PARTITION (<varname>partition_spec</varname>)
   <ph rev="2.3.0">[PURGE]</ph>
+<ph rev="kudu">ALTER TABLE <varname>name</varname> DROP [IF EXISTS] RANGE PARTITION <varname>kudu_partition_spec</varname></ph>
 
 <ph rev="2.3.0 IMPALA-1568 CDH-36799">ALTER TABLE <varname>name</varname> RECOVER PARTITIONS</ph>
 
@@ -86,12 +89,18 @@ statsKey ::= numDVs | numNulls | avgSize | maxSize</ph>
 
 <varname>col_spec</varname> ::= <varname>col_name</varname> <varname>type_name</varname>
 
-<varname>partition_spec</varname> ::= <varname>simple_partition_spec</varname> | <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname></ph> | <ph rev="kudu"><varname>kudu_partition_spec</varname></ph>
+<varname>partition_spec</varname> ::= <varname>simple_partition_spec</varname> | <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname></ph>
 
 <varname>simple_partition_spec</varname> ::= <varname>partition_col</varname>=<varname>constant_value</varname>
 
 <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname> ::= <varname>comparison_expression_on_partition_col</varname></ph>
 
+<ph rev="kudu"><varname>kudu_partition_spec</varname> ::= <varname>constant</varname> <varname>range_operator</varname> VALUES <varname>range_operator</varname> <varname>constant</varname> | VALUE = <varname>constant</varname></ph>
+
+<ph rev="IMPALA-4390">cache_spec ::= CACHED IN '<varname>pool_name</varname>' [WITH REPLICATION = <varname>integer</varname>] | UNCACHED</ph>
+
+<ph rev="IMPALA-4390">location_spec ::= LOCATION '<varname>hdfs_path_of_directory</varname>'</ph>
+
 <varname>table_properties</varname> ::= '<varname>name</varname>'='<varname>value</varname>'[, '<varname>name</varname>'='<varname>value</varname>' ...]
 
 <varname>serde_properties</varname> ::= '<varname>name</varname>'='<varname>value</varname>'[, '<varname>name</varname>'='<varname>value</varname>' ...]
@@ -896,6 +905,75 @@ alter table sales_data add partition (zipcode = cast(9021 * 10 as string));</cod
       require write and execute permissions for the associated partition directory.
     </p>
 
+    <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+
+    <p rev="kudu IMPALA-2890">
+      Because of the extra constraints and features of Kudu tables, such as the <codeph>NOT NULL</codeph>
+      and <codeph>DEFAULT</codeph> attributes for columns, <codeph>ALTER TABLE</codeph> has specific
+      requirements related to Kudu tables:
+      <ul>
+        <li>
+          <p>
+            In an <codeph>ADD COLUMNS</codeph> operation, you can specify the <codeph>NULL</codeph>,
+            <codeph>NOT NULL</codeph>, and <codeph>DEFAULT <varname>default_value</varname></codeph>
+            column attributes.
+          </p>
+        </li>
+        <li>
+          <p>
+            If you add a column with a <codeph>NOT NULL</codeph> attribute, it must also have a
+            <codeph>DEFAULT</codeph> attribute, so the default value can be assigned to that
+            column for all existing rows.
+          </p>
+        </li>
+        <li>
+          <p>
+            The <codeph>DROP COLUMN</codeph> clause works the same for a Kudu table as for other
+            kinds of tables.
+          </p>
+        </li>
+        <li>
+          <p>
+            Although you can change the name of a column with the <codeph>CHANGE</codeph> clause,
+            you cannot change the type of a column in a Kudu table.
+          </p>
+        </li>
+        <li>
+          <p>
+            You cannot assign the <codeph>ENCODING</codeph>, <codeph>COMPRESSION</codeph>,
+            or <codeph>BLOCK_SIZE</codeph> attributes when adding a column.
+          </p>
+        </li>
+        <li>
+          <p>
+            You cannot change the default value, nullability, encoding, compression, or block size
+            of existing columns in a Kudu table.
+          </p>
+        </li>
+        <li>
+          <p>
+            You cannot use the <codeph>REPLACE COLUMNS</codeph> clause with a Kudu table.
+          </p>
+        </li>
+        <li>
+          <p>
+            The <codeph>RENAME TO</codeph> clause for a Kudu table only affects the name stored in the
+            metastore database that Impala uses to refer to the table. To change which underlying Kudu
+            table is associated with an Impala table name, you must change the <codeph>TBLPROPERTIES</codeph>
+            property of the table: <codeph>SET TBLPROPERTIES('kudu.table_name'='<varname>kudu_tbl_name</varname>)</codeph>.
+            Doing so causes Kudu to change the name of the underlying Kudu table.
+          </p>
+        </li>
+      </ul>
+    </p>
+
+    <p rev="kudu">
+      Kudu tables all use an underlying partitioning mechanism. The partition syntax is different than for non-Kudu
+      tables. You can use the <codeph>ALTER TABLE</codeph> statement to add and drop <term>range partitions</term>
+      from a Kudu table. Any new range must not overlap with any existing ranges. Dropping a range removes all the associated
+      rows from the table. See <xref href="impala_kudu.xml#kudu_partitioning"/> for details.
+    </p>
+
     <p conref="../shared/impala_common.xml#common/related_info"/>
 
     <p>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_array.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_array.xml b/docs/topics/impala_array.xml
index be26874..f882b97 100644
--- a/docs/topics/impala_array.xml
+++ b/docs/topics/impala_array.xml
@@ -115,6 +115,9 @@ type ::= <varname>primitive_type</varname> | <varname>complex_type</varname>
         <li/>
       </ul>
 
+      <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+      <p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
+
       <p conref="../shared/impala_common.xml#common/example_blurb"/>
 
       <note conref="../shared/impala_common.xml#common/complex_type_schema_pointer"/>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_boolean.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_boolean.xml b/docs/topics/impala_boolean.xml
index fcb3ec7..1e0690f 100644
--- a/docs/topics/impala_boolean.xml
+++ b/docs/topics/impala_boolean.xml
@@ -161,6 +161,9 @@ SELECT claim FROM assertions WHERE really = TRUE;
 
 <!-- <p conref="../shared/impala_common.xml#common/restrictions_blurb"/> -->
 
+    <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+    <p conref="../shared/impala_common.xml#common/kudu_non_pk_data_type"/>
+
 <!-- <p conref="../shared/impala_common.xml#common/related_info"/> -->
 
     <p>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_char.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_char.xml b/docs/topics/impala_char.xml
index ca6f314..dc8ad5a 100644
--- a/docs/topics/impala_char.xml
+++ b/docs/topics/impala_char.xml
@@ -243,6 +243,9 @@ select concat('[',a,']') as a, concat('[',b,']') as b, concat('[',c,']') as c fr
 +------------------------+----------------------------------+--------------------------------------------+
 </codeblock>
 
+    <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+    <p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
+
     <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
 
     <p>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_compute_stats.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_compute_stats.xml b/docs/topics/impala_compute_stats.xml
index 91f45c2..bd21dae 100644
--- a/docs/topics/impala_compute_stats.xml
+++ b/docs/topics/impala_compute_stats.xml
@@ -52,8 +52,7 @@ under the License.
 <codeblock rev="2.1.0">COMPUTE STATS [<varname>db_name</varname>.]<varname>table_name</varname>
 COMPUTE INCREMENTAL STATS [<varname>db_name</varname>.]<varname>table_name</varname> [PARTITION (<varname>partition_spec</varname>)]
 
-<!-- Is kudu_partition_spec applicable here? -->
-<varname>partition_spec</varname> ::= <varname>simple_partition_spec</varname> | <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname></ph> | <ph rev="kudu"><varname>kudu_partition_spec</varname></ph>
+<varname>partition_spec</varname> ::= <varname>simple_partition_spec</varname> | <ph rev="IMPALA-1654"><varname>complex_partition_spec</varname></ph>
 
 <varname>simple_partition_spec</varname> ::= <varname>partition_col</varname>=<varname>constant_value</varname>
 
@@ -523,6 +522,17 @@ show table stats item_partitioned;
       against the table.)
     </p>
 
+    <p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
+
+    <p rev="IMPALA-2830">
+      The <codeph>COMPUTE STATS</codeph> statement applies to Kudu tables.
+      Impala does not compute the number of rows for each partition for
+      Kudu tables. Therefore, you do not need to re-run the operation when
+      you see -1 in the <codeph># Rows</codeph> column of the output from
+      <codeph>SHOW TABLE STATS</codeph>. That column always shows -1 for
+      all Kudu tables. 
+    </p>
+
     <p conref="../shared/impala_common.xml#common/related_info"/>
 
     <p>

[3/9] incubator-impala git commit: [DOCS] Major update to Impala + Kudu page

Posted by mj...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_kudu.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_kudu.xml b/docs/topics/impala_kudu.xml
index ef90e63..cf08671 100644
--- a/docs/topics/impala_kudu.xml
+++ b/docs/topics/impala_kudu.xml
@@ -20,7 +20,7 @@ under the License.
 <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
 <concept id="impala_kudu" rev="kudu">
 
-  <title>Using Impala to Query Kudu Tables</title>
+  <title id="kudu">Using Impala to Query Kudu Tables</title>
 
   <prolog>
     <metadata>
@@ -36,19 +36,30 @@ under the License.
 
     <p>
       <indexterm audience="hidden">Kudu</indexterm>
-      You can use Impala to query Kudu tables. This capability allows convenient access to a storage system that is
-      tuned for different kinds of workloads than the default with Impala. The default Impala tables use data files
-      stored on HDFS, which are ideal for bulk loads and queries using full-table scans. In contrast, Kudu can do
-      efficient queries for data organized either in data warehouse style (with full table scans) or for OLTP-style
-      workloads (with key-based lookups for single rows or small ranges of values).
+      You can use Impala to query tables stored by Apache Kudu. This capability
+      allows convenient access to a storage system that is tuned for different kinds of
+      workloads than the default with Impala.
     </p>
 
     <p>
-      Certain Impala SQL statements, such as <codeph>UPDATE</codeph> and <codeph>DELETE</codeph>, only work with
-      Kudu tables. These operations were impractical from a performance perspective to perform at large scale on
-      HDFS data, or on HBase tables.
+      By default, Impala tables are stored on HDFS using data files with various file formats.
+      HDFS files are ideal for bulk loads (append operations) and queries using full-table scans,
+      but do not support in-place updates or deletes. Kudu is an alternative storage engine used
+      by Impala which can do both in-place updates (for mixed read/write workloads) and fast scans
+      (for data-warehouse/analytic operations). Using Kudu tables with Impala can simplify the
+      ETL pipeline by avoiding extra steps to segregate and reorganize newly arrived data.
     </p>
 
+    <p>
+      Certain Impala SQL statements and clauses, such as <codeph>DELETE</codeph>,
+      <codeph>UPDATE</codeph>, <codeph>UPSERT</codeph>, and <codeph>PRIMARY KEY</codeph> work
+      only with Kudu tables. Other statements and clauses, such as <codeph>LOAD DATA</codeph>,
+      <codeph>TRUNCATE TABLE</codeph>, and <codeph>INSERT OVERWRITE</codeph>, are not applicable
+      to Kudu tables.
+    </p>
+
+    <p outputclass="toc inpage"/>
+
   </conbody>
 
   <concept id="kudu_benefits">
@@ -58,50 +69,1097 @@ under the License.
     <conbody>
 
       <p>
-        The combination of Kudu and Impala works best for tables where scan performance is important, but data
-        arrives continuously, in small batches, or needs to be updated without being completely replaced. In these
-        scenarios (such as for streaming data), it might be impractical to use Parquet tables because Parquet works
-        best with multi-megabyte data files, requiring substantial overhead to replace or reorganize data files to
-        accomodate frequent additions or changes to data. Impala can query Kudu tables with scan performance close
-        to that of Parquet, and Impala can also perform update or delete operations without replacing the entire
-        table contents. You can also use the Kudu API to do ingestion or transformation operations outside of
-        Impala, and Impala can query the current data at any time.
+        The combination of Kudu and Impala works best for tables where scan performance is
+        important, but data arrives continuously, in small batches, or needs to be updated
+        without being completely replaced. HDFS-backed tables can require substantial overhead
+        to replace or reorganize data files as new data arrives. Impala can perform efficient
+        lookups and scans within Kudu tables, and Impala can also perform update or
+        delete operations efficiently. You can also use the Kudu Java, C++, and Python APIs to
+        do ingestion or transformation operations outside of Impala, and Impala can query the
+        current data at any time.
       </p>
 
     </conbody>
 
   </concept>
 
-  <concept id="kudu_primary_key">
+  <concept id="kudu_config">
 
-    <title>Primary Key Columns for Kudu Tables</title>
+    <title>Configuring Impala for Use with Kudu</title>
 
     <conbody>
 
       <p>
-        Kudu tables introduce the notion of primary keys to Impala for the first time. The primary key is made up
-        of one or more columns, whose values are combined and used as a lookup key during queries. These columns
-        cannot contain any <codeph>NULL</codeph> values or any duplicate values, and can never be updated. For a
-        partitioned Kudu table, all the partition key columns must come from the set of primary key columns.
+        The <codeph>-kudu_master_hosts</codeph> configuration property must be set correctly
+        for the <cmdname>impalad</cmdname> daemon, for <codeph>CREATE TABLE ... STORED AS
+        KUDU</codeph> statements to connect to the appropriate Kudu server. Typically, the
+        required value for this setting is <codeph><varname>kudu_host</varname>:7051</codeph>.
+        In a high-availability Kudu deployment, specify the names of multiple Kudu hosts separated by commas.
       </p>
 
       <p>
-        Impala itself still does not have the notion of unique or non-<codeph>NULL</codeph> constraints. These
-        restrictions on the primary key columns are enforced on the Kudu side.
+        If the <codeph>-kudu_master_hosts</codeph> configuration property is not set, you can
+        still associate the appropriate value for each table by specifying a
+        <codeph>TBLPROPERTIES('kudu.master_addresses')</codeph> clause in the <codeph>CREATE TABLE</codeph> statement or
+        changing the <codeph>TBLPROPERTIES('kudu.master_addresses')</codeph> value with an <codeph>ALTER TABLE</codeph>
+        statement.
       </p>
 
+    </conbody>
+
+    <concept id="kudu_topology">
+
+      <title>Cluster Topology for Kudu Tables</title>
+
+      <conbody>
+
+        <p>
+          With HDFS-backed tables, you are typically concerned with the number of DataNodes in
+          the cluster, how many and how large HDFS data files are read during a query, and
+          therefore the amount of work performed by each DataNode and the network communication
+          to combine intermediate results and produce the final result set.
+        </p>
+
+        <p>
+          With Kudu tables, the topology considerations are different, because:
+        </p>
+
+        <ul>
+          <li>
+            <p>
+              The underlying storage is managed and organized by Kudu, not represented as HDFS
+              data files.
+            </p>
+          </li>
+
+          <li>
+            <p>
+              Kudu handles some of the underlying mechanics of partitioning the data. You can specify
+              the partitioning scheme with combinations of hash and range partitioning, so that you can
+              decide how much effort to expend to manage the partitions as new data arrives. For example,
+              you can construct partitions that apply to date ranges rather than a separate partition for each
+              day or each hour.
+            </p>
+          </li>
+
+          <li>
+            <p>
+              Data is physically divided based on units of storage called <term>tablets</term>. Tablets are
+              stored by <term>tablet servers</term>. Each tablet server can store multiple tablets,
+              and each tablet is replicated across multiple tablet servers, managed automatically by Kudu.
+              Where practical, colocate the tablet servers on the same hosts as the DataNodes, although that is not required.
+            </p>
+          </li>
+        </ul>
+
+        <p>
+          One consideration for the cluster topology is that the number of replicas for a Kudu table
+          must be odd.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="kudu_ddl">
+
+    <title>Impala DDL Enhancements for Kudu Tables (CREATE TABLE and ALTER TABLE)</title>
+
+    <prolog>
+      <metadata>
+        <data name="Category" value="DDL"/>
+      </metadata>
+    </prolog>
+
+    <conbody>
+
       <p>
-        The primary key columns must be the first ones specified in the <codeph>CREATE TABLE</codeph> statement.
-        You specify which column or columns make up the primary key in the table properties, rather than through
-        attributes in the column list.
+        You can use the Impala <codeph>CREATE TABLE</codeph> and <codeph>ALTER TABLE</codeph>
+        statements to create and fine-tune the characteristics of Kudu tables. Because Kudu
+        tables have features and properties that do not apply to other kinds of Impala tables,
+        familiarize yourself with Kudu-related concepts and syntax first.
+        For the general syntax of the <codeph>CREATE TABLE</codeph>
+        statement for Kudu tables, see <xref keyref="create_table"/>.
+      </p>
+
+      <p outputclass="toc inpage"/>
+
+    </conbody>
+
+    <concept id="kudu_primary_key">
+
+      <title>Primary Key Columns for Kudu Tables</title>
+
+      <conbody>
+
+        <p>
+          Kudu tables introduce the notion of primary keys to Impala for the first time. The
+          primary key is made up of one or more columns, whose values are combined and used as a
+          lookup key during queries. The tuple represented by these columns must be unique and cannot contain any
+          <codeph>NULL</codeph> values, and can never be updated once inserted. For a
+          Kudu table, all the partition key columns must come from the set of
+          primary key columns.
+        </p>
+
+        <p>
+          The primary key has both physical and logical aspects:
+        </p>
+
+        <ul>
+          <li>
+            <p>
+              On the physical side, it is used to map the data values to particular tablets for fast retrieval.
+              Because the tuples formed by the primary key values are unique, the primary key columns are typically
+              highly selective.
+            </p>
+          </li>
+          <li>
+            <p>
+              On the logical side, the uniqueness constraint allows you to avoid duplicate data in a table.
+              For example, if an <codeph>INSERT</codeph> operation fails partway through, only some of the
+              new rows might be present in the table. You can re-run the same <codeph>INSERT</codeph>, and
+              only the missing rows will be added. Or if data in the table is stale, you can run an
+              <codeph>UPSERT</codeph> statement that brings the data up to date, without the possibility
+              of creating duplicate copies of existing rows.
+            </p>
+          </li>
+        </ul>
+
+        <note>
+          <p>
+            Impala only allows <codeph>PRIMARY KEY</codeph> clauses and <codeph>NOT NULL</codeph>
+            constraints on columns for Kudu tables. These constraints are enforced on the Kudu side.
+          </p>
+        </note>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="kudu_column_attributes" rev="IMPALA-3726">
+
+      <title>Kudu-Specific Column Attributes for CREATE TABLE</title>
+
+      <conbody>
+
+        <p>
+          For the general syntax of the <codeph>CREATE TABLE</codeph>
+          statement for Kudu tables, see <xref keyref="create_table"/>.
+          The following sections provide more detail for some of the
+          Kudu-specific keywords you can use in column definitions.
+        </p>
+
+        <p>
+          The column list in a <codeph>CREATE TABLE</codeph> statement can include the following
+          attributes, which only apply to Kudu tables:
+        </p>
+
+<codeblock>
+  PRIMARY KEY
+| [NOT] NULL
+| ENCODING <varname>codec</varname>
+| COMPRESSION <varname>algorithm</varname>
+| DEFAULT <varname>constant_expression</varname>
+| BLOCK_SIZE <varname>number</varname>
+</codeblock>
+
+        <p outputclass="toc inpage">
+          See the following sections for details about each column attribute.
+        </p>
+
+      </conbody>
+
+      <concept id="kudu_primary_key_attribute">
+
+        <title>PRIMARY KEY Attribute</title>
+
+        <conbody>
+
+          <p>
+            The primary key for a Kudu table is a column, or set of columns, that uniquely
+            identifies every row. The primary key value also is used as the natural sort order
+            for the values from the table. The primary key value for each row is based on the
+            combination of values for the columns.
+          </p>
+
+          <p conref="../shared/impala_common.xml#common/pk_implies_not_null"/>
+
+          <p>
+            The primary key columns must be the first ones specified in the <codeph>CREATE
+            TABLE</codeph> statement. For a single-column primary key, you can include a
+            <codeph>PRIMARY KEY</codeph> attribute inline with the column definition. For a
+            multi-column primary key, you include a <codeph>PRIMARY KEY (<varname>c1</varname>,
+            <varname>c2</varname>, ...)</codeph> clause as a separate entry at the end of the
+            column list.
+          </p>
+
+          <p>
+            You can specify the <codeph>PRIMARY KEY</codeph> attribute either inline in a single
+            column definition, or as a separate clause at the end of the column list:
+          </p>
+
+<codeblock>
+CREATE TABLE pk_inline
+(
+  col1 BIGINT PRIMARY KEY,
+  col2 STRING,
+  col3 BOOLEAN
+) PARTITION BY HASH(col1) PARTITIONS 2 STORED AS KUDU;
+
+CREATE TABLE pk_at_end
+(
+  col1 BIGINT,
+  col2 STRING,
+  col3 BOOLEAN,
+  PRIMARY KEY (col1)
+) PARTITION BY HASH(col1) PARTITIONS 2 STORED AS KUDU;
+</codeblock>
+
+          <p>
+            When the primary key is a single column, these two forms are equivalent. If the
+            primary key consists of more than one column, you must specify the primary key using
+            a separate entry in the column list:
+          </p>
+
+<codeblock>
+CREATE TABLE pk_multiple_columns
+(
+  col1 BIGINT,
+  col2 STRING,
+  col3 BOOLEAN,
+  <b>PRIMARY KEY (col1, col2)</b>
+) PARTITION BY HASH(col2) PARTITIONS 2 STORED AS KUDU;
+</codeblock>
+
+          <p>
+            The <codeph>SHOW CREATE TABLE</codeph> statement always represents the
+            <codeph>PRIMARY KEY</codeph> specification as a separate item in the column list:
+          </p>
+
+<codeblock>
+CREATE TABLE inline_pk_rewritten (id BIGINT <b>PRIMARY KEY</b>, s STRING)
+  PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;
+
+SHOW CREATE TABLE inline_pk_rewritten;
++------------------------------------------------------------------------------+
+| result                                                                       |
++------------------------------------------------------------------------------+
+| CREATE TABLE user.inline_pk_rewritten (                                      |
+|   id BIGINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, |
+|   s STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,      |
+|   <b>PRIMARY KEY (id)</b>                                                           |
+| )                                                                            |
+| PARTITION BY HASH (id) PARTITIONS 2                                          |
+| STORED AS KUDU                                                               |
+| TBLPROPERTIES ('kudu.master_addresses'='host.example.com')                   |
++------------------------------------------------------------------------------+
+</codeblock>
+
+          <p>
+            The notion of primary key only applies to Kudu tables. Every Kudu table requires a
+            primary key. The primary key consists of one or more columns. You must specify any
+            primary key columns first in the column list.
+          </p>
+
+          <p>
+            The contents of the primary key columns cannot be changed by an
+            <codeph>UPDATE</codeph> or <codeph>UPSERT</codeph> statement. Including too many
+            columns in the primary key (more than 5 or 6) can also reduce the performance of
+            write operations. Therefore, pick the most selective and most frequently
+            tested non-null columns for the primary key specification.
+            If a column must always have a value, but that value
+            might change later, leave it out of the primary key and use a <codeph>NOT
+            NULL</codeph> clause for that column instead. If an existing row has an
+            incorrect or outdated key column value, delete the old row and insert an entirely
+            new row with the correct primary key.
+          </p>
+
+        </conbody>
+
+      </concept>
+
+      <concept id="kudu_not_null_attribute">
+
+        <title>NULL | NOT NULL Attribute</title>
+
+        <conbody>
+
+          <p>
+            For Kudu tables, you can specify which columns can contain nulls or not. This
+            constraint offers an extra level of consistency enforcement for Kudu tables. If an
+            application requires a field to always be specified, include a <codeph>NOT
+            NULL</codeph> clause in the corresponding column definition, and Kudu prevents rows
+            from being inserted with a <codeph>NULL</codeph> in that column.
+          </p>
+
+          <p>
+            For example, a table containing geographic information might require the latitude
+            and longitude coordinates to always be specified. Other attributes might be allowed
+            to be <codeph>NULL</codeph>. For example, a location might not have a designated
+            place name, its altitude might be unimportant, and its population might be initially
+            unknown, to be filled in later.
+          </p>
+
+          <p conref="../shared/impala_common.xml#common/pk_implies_not_null"/>
+
+          <p>
+            For non-Kudu tables, Impala allows any column to contain <codeph>NULL</codeph>
+            values, because it is not practical to enforce a <q>not null</q> constraint on HDFS
+            data files that could be prepared using external tools and ETL processes.
+          </p>
+
+<codeblock>
+CREATE TABLE required_columns
+(
+  id BIGINT PRIMARY KEY,
+  latitude DOUBLE NOT NULL,
+  longitude DOUBLE NOT NULL,
+  place_name STRING,
+  altitude DOUBLE,
+  population BIGINT
+) PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;
+</codeblock>
+
+          <p>
+            During performance optimization, Kudu can use the knowledge that nulls are not
+            allowed to skip certain checks on each input row, speeding up queries and join
+            operations. Therefore, specify <codeph>NOT NULL</codeph> constraints when
+            appropriate.
+          </p>
+
+          <p>
+            The <codeph>NULL</codeph> clause is the default condition for all columns that are not
+            part of the primary key. You can omit it, or specify it to clarify that you have made a
+            conscious design decision to allow nulls in a column.
+          </p>
+
+          <p>
+            Because primary key columns cannot contain any <codeph>NULL</codeph> values, the
+            <codeph>NOT NULL</codeph> clause is not required for the primary key columns,
+            but you might still specify it to make your code self-describing.
+          </p>
+
+        </conbody>
+
+      </concept>
+
+      <concept id="kudu_default_attribute">
+
+        <title>DEFAULT Attribute</title>
+
+        <conbody>
+
+          <p>
+            You can specify a default value for columns in Kudu tables. The default value can be
+            any constant expression, for example, a combination of literal values, arithmetic
+            and string operations. It cannot contain references to columns or non-deterministic
+            function calls.
+          </p>
+
+          <p>
+            The following example shows different kinds of expressions for the
+            <codeph>DEFAULT</codeph> clause. The requirement to use a constant value means that
+            you can fill in a placeholder value such as <codeph>NULL</codeph>, empty string,
+            0, -1, <codeph>'N/A'</codeph> and so on, but you cannot reference functions or
+            column names. Therefore, you cannot use <codeph>DEFAULT</codeph> to do things such as
+            automatically making an uppercase copy of a string value, storing Boolean values based
+            on tests of other columns, or add or subtract one from another column representing a sequence number.
+          </p>
+
+<codeblock>
+CREATE TABLE default_vals
+(
+  id BIGINT PRIMARY KEY,
+  name STRING NOT NULL DEFAULT 'unknown',
+  address STRING DEFAULT upper('no fixed address'),
+  age INT DEFAULT -1,
+  earthling BOOLEAN DEFAULT TRUE,
+  planet_of_origin STRING DEFAULT 'Earth',
+  optional_col STRING DEFAULT NULL
+) PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;
+</codeblock>
+
+          <note>
+            <p>
+              When designing an entirely new schema, prefer to use <codeph>NULL</codeph> as the
+              placeholder for any unknown or missing values, because that is the universal convention
+              among database systems. Null values can be stored efficiently, and easily checked with the
+              <codeph>IS NULL</codeph> or <codeph>IS NOT NULL</codeph> operators. The <codeph>DEFAULT</codeph>
+              attribute is appropriate when ingesting data that already has an established convention for
+              representing unknown or missing values, or where the vast majority of rows have some common
+              non-null value.
+            </p>
+          </note>
+
+        </conbody>
+
+      </concept>
+
+      <concept id="kudu_encoding_attribute">
+
+        <title>ENCODING Attribute</title>
+
+        <conbody>
+
+          <p>
+            Each column in a Kudu table can optionally use an encoding, a low-overhead form of
+            compression that reduces the size on disk, then requires additional CPU cycles to
+            reconstruct the original values during queries. Typically, highly compressible data
+            benefits from the reduced I/O to read the data back from disk. By default, each
+            column uses the <q>plain</q> encoding where the data is stored unchanged.
+          </p>
+
+          <p>
+            The encoding keywords that Impala recognizes are:
+
+            <ul>
+              <li>
+                <p>
+                  <codeph>AUTO_ENCODING</codeph>: use the default encoding based on the column
+                  type; currently always the same as <codeph>PLAIN_ENCODING</codeph>, but subject to
+                  change in the future.
+                </p>
+              </li>
+              <li>
+                <p>
+                  <codeph>PLAIN_ENCODING</codeph>: leave the value in its original binary format.
+                </p>
+              </li>
+              <!-- GROUP_VARINT is internal use only, not documenting that although it shows up
+                   in parser error messages. -->
+              <li>
+                <p>
+                  <codeph>RLE</codeph>: compress repeated values (when sorted in primary key
+                  order) by including a count.
+                </p>
+              </li>
+              <li>
+                <p>
+                  <codeph>DICT_ENCODING</codeph>: when the number of different string values is
+                  low, replace the original string with a numeric ID.
+                </p>
+              </li>
+              <li>
+                <p>
+                  <codeph>BIT_SHUFFLE</codeph>: rearrange the bits of the values to efficiently
+                  compress sequences of values that are identical or vary only slightly based
+                  on primary key order. The resulting encoded data is also compressed with LZ4.
+                </p>
+              </li>
+              <li>
+                <p>
+                  <codeph>PREFIX_ENCODING</codeph>: compress common prefixes in string values; mainly for use internally within Kudu.
+                </p>
+              </li>
+            </ul>
+          </p>
+
+<!--
+UNKNOWN, AUTO_ENCODING, PLAIN_ENCODING, PREFIX_ENCODING, GROUP_VARINT, RLE, DICT_ENCODING, BIT_SHUFFLE
+
+No joy trying keywords UNKNOWN, or GROUP_VARINT with TINYINT and BIGINT.
+-->
+
+          <p>
+            The following example shows the Impala keywords representing the encoding types.
+            (The Impala keywords match the symbolic names used within Kudu.)
+            For usage guidelines on the different kinds of encoding, see
+            <xref href="https://kudu.apache.org/docs/schema_design.html" scope="external" format="html">the Kudu documentation</xref>.
+            The <codeph>DESCRIBE</codeph> output shows how the encoding is reported after
+            the table is created, and that omitting the encoding (in this case, for the
+            <codeph>ID</codeph> column) is the same as specifying <codeph>DEFAULT_ENCODING</codeph>.
+          </p>
+
+<codeblock>
+CREATE TABLE various_encodings
+(
+  id BIGINT PRIMARY KEY,
+  c1 BIGINT ENCODING PLAIN_ENCODING,
+  c2 BIGINT ENCODING AUTO_ENCODING,
+  c3 TINYINT ENCODING BIT_SHUFFLE,
+  c4 DOUBLE ENCODING BIT_SHUFFLE,
+  c5 BOOLEAN ENCODING RLE,
+  c6 STRING ENCODING DICT_ENCODING,
+  c7 STRING ENCODING PREFIX_ENCODING
+) PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;
+
+-- Some columns are omitted from the output for readability.
+describe various_encodings;
++------+---------+-------------+----------+-----------------+
+| name | type    | primary_key | nullable | encoding        |
++------+---------+-------------+----------+-----------------+
+| id   | bigint  | true        | false    | AUTO_ENCODING   |
+| c1   | bigint  | false       | true     | PLAIN_ENCODING  |
+| c2   | bigint  | false       | true     | AUTO_ENCODING   |
+| c3   | tinyint | false       | true     | BIT_SHUFFLE     |
+| c4   | double  | false       | true     | BIT_SHUFFLE     |
+| c5   | boolean | false       | true     | RLE             |
+| c6   | string  | false       | true     | DICT_ENCODING   |
+| c7   | string  | false       | true     | PREFIX_ENCODING |
++------+---------+-------------+----------+-----------------+
+</codeblock>
+
+        </conbody>
+
+      </concept>
+
+      <concept id="kudu_compression_attribute">
+
+        <title>COMPRESSION Attribute</title>
+
+        <conbody>
+
+          <p>
+            You can specify a compression algorithm to use for each column in a Kudu table. This
+            attribute imposes more CPU overhead when retrieving the values than the
+            <codeph>ENCODING</codeph> attribute does. Therefore, use it primarily for columns with
+            long strings that do not benefit much from the less-expensive <codeph>ENCODING</codeph>
+            attribute.
+          </p>
+
+          <p>
+            The choices for <codeph>COMPRESSION</codeph> are <codeph>LZ4</codeph>,
+            <codeph>SNAPPY</codeph>, and <codeph>ZLIB</codeph>.
+          </p>
+
+          <note>
+            <p>
+              Columns that use the <codeph>BITSHUFFLE</codeph> encoding are already compressed
+              using <codeph>LZ4</codeph>, and so typically do not need any additional
+              <codeph>COMPRESSION</codeph> attribute.
+            </p>
+          </note>
+
+          <p>
+            The following example shows design considerations for several
+            <codeph>STRING</codeph> columns with different distribution characteristics, leading
+            to choices for both the <codeph>ENCODING</codeph> and <codeph>COMPRESSION</codeph>
+            attributes. The <codeph>country</codeph> values come from a specific set of strings,
+            therefore this column is a good candidate for dictionary encoding. The
+            <codeph>post_id</codeph> column contains an ascending sequence of integers, where
+            several leading bits are likely to be all zeroes, therefore this column is a good
+            candidate for bitshuffle encoding. The <codeph>body</codeph>
+            column and the corresponding columns for translated versions tend to be long unique
+            strings that are not practical to use with any of the encoding schemes, therefore
+            they employ the <codeph>COMPRESSION</codeph> attribute instead. The ideal compression
+            codec in each case would require some experimentation to determine how much space
+            savings it provided and how much CPU overhead it added, based on real-world data.
+          </p>
+
+<codeblock>
+CREATE TABLE blog_posts
+(
+  user_id STRING ENCODING DICT_ENCODING,
+  post_id BIGINT ENCODING BIT_SHUFFLE,
+  subject STRING ENCODING PLAIN_ENCODING,
+  body STRING COMPRESSION LZ4,
+  spanish_translation STRING COMPRESSION SNAPPY,
+  esperanto_translation STRING COMPRESSION ZLIB,
+  PRIMARY KEY (user_id, post_id)
+) PARTITION BY HASH(user_id, post_id) PARTITIONS 2 STORED AS KUDU;
+</codeblock>
+
+        </conbody>
+
+      </concept>
+
+      <concept id="kudu_block_size_attribute">
+
+        <title>BLOCK_SIZE Attribute</title>
+
+        <conbody>
+
+          <p>
+            Although Kudu does not use HDFS files internally, and thus is not affected by
+            the HDFS block size, it does have an underlying unit of I/O called the
+            <term>block size</term>. The <codeph>BLOCK_SIZE</codeph> attribute lets you set the
+            block size for any column.
+          </p>
+
+          <p>
+            The block size attribute is a relatively advanced feature. Refer to
+            <xref href="https://kudu.apache.org/docs/index.html" scope="external" format="html">the Kudu documentation</xref>
+            for usage details.
+          </p>
+
+<!-- Commenting out this example for the time being.
+<codeblock>
+CREATE TABLE performance_for_benchmark_xyz
+(
+  id BIGINT PRIMARY KEY,
+  col1 BIGINT BLOCK_SIZE 4096,
+  col2 STRING BLOCK_SIZE 16384,
+  col3 SMALLINT BLOCK_SIZE 2048
+) PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;
+</codeblock>
+-->
+
+        </conbody>
+
+      </concept>
+
+    </concept>
+
+    <concept id="kudu_partitioning">
+
+      <title>Partitioning for Kudu Tables</title>
+
+      <conbody>
+
+        <p>
+          Kudu tables use special mechanisms to distribute data among the underlying
+          tablet servers. Although we refer to such tables as partitioned tables, they are
+          distinguished from traditional Impala partitioned tables by use of different clauses
+          on the <codeph>CREATE TABLE</codeph> statement. Kudu tables use
+          <codeph>PARTITION BY</codeph>, <codeph>HASH</codeph>, <codeph>RANGE</codeph>, and
+          range specification clauses rather than the <codeph>PARTITIONED BY</codeph> clause
+          for HDFS-backed tables, which specifies only a column name and creates a new partition for each
+          different value.
+        </p>
+
+        <p>
+          For background information and architectural details about the Kudu partitioning
+          mechanism, see
+          <xref href="https://kudu.apache.org/kudu.pdf" scope="external" format="html">the Kudu white paper, section 3.2</xref>.
+        </p>
+
+<!-- Hiding but leaving in place for the moment, in case the white paper discussion isn't enough.
+        <p>
+          With Kudu tables, all of the columns involved in these clauses must be primary key
+          columns. These clauses let you specify different ways to divide the data for each
+          column, or even for different value ranges within a column. This flexibility lets you
+          avoid problems with uneven distribution of data, where the partitioning scheme for
+          HDFS tables might result in some partitions being much larger than others. By setting
+          up an effective partitioning scheme for a Kudu table, you can ensure that the work for
+          a query can be parallelized evenly across the hosts in a cluster.
+        </p>
+-->
+
+        <note>
+          <p>
+            The Impala DDL syntax for Kudu tables is different than in early Kudu versions,
+            which used an experimental fork of the Impala code. For example, the
+            <codeph>DISTRIBUTE BY</codeph> clause is now <codeph>PARTITION BY</codeph>, the
+            <codeph>INTO <varname>n</varname> BUCKETS</codeph> clause is now
+            <codeph>PARTITIONS <varname>n</varname></codeph> and the range partitioning syntax
+            is reworked to replace the <codeph>SPLIT ROWS</codeph> clause with more expressive
+            syntax involving comparison operators.
+          </p>
+        </note>
+
+        <p outputclass="toc inpage"/>
+
+      </conbody>
+
+      <concept id="kudu_hash_partitioning">
+        <title>Hash Partitioning</title>
+        <conbody>
+
+          <p>
+            Hash partitioning is the simplest type of partitioning for Kudu tables.
+            For hash-partitioned Kudu tables, inserted rows are divided up between a fixed number
+            of <q>buckets</q> by applying a hash function to the values of the columns specified
+            in the <codeph>HASH</codeph> clause.
+            Hashing ensures that rows with similar values are evenly distributed, instead of
+            clumping together all in the same bucket. Spreading new rows across the buckets this
+            way lets insertion operations work in parallel across multiple tablet servers.
+            Separating the hashed values can impose additional overhead on queries, where
+            queries with range-based predicates might have to read multiple tablets to retrieve
+            all the relevant values.
+          </p>
+
+<codeblock>
+-- 1M rows with 50 hash partitions = approximately 20,000 rows per partition.
+-- The values in each partition are not sequential, but rather based on a hash function.
+-- Rows 1, 99999, and 123456 might be in the same partition.
+CREATE TABLE million_rows (id string primary key, s string)
+  PARTITION BY HASH(id) PARTITIONS 50
+  STORED AS KUDU;
+
+-- Because the ID values are unique, we expect the rows to be roughly
+-- evenly distributed between the buckets in the destination table.
+INSERT INTO million_rows SELECT * FROM billion_rows ORDER BY id LIMIT 1e6;
+</codeblock>
+
+          <note>
+            <p>
+              The largest number of buckets that you can create with a <codeph>PARTITIONS</codeph>
+              clause varies depending on the number of tablet servers in the cluster, while the smallest is 2.
+              For simplicity, some of the simple <codeph>CREATE TABLE</codeph> statements throughout this section
+              use <codeph>PARTITIONS 2</codeph> to illustrate the minimum requirements for a Kudu table.
+              For large tables, prefer to use roughly 10 partitions per server in the cluster.
+            </p>
+          </note>
+
+        </conbody>
+      </concept>
+
+      <concept id="kudu_range_partitioning">
+        <title>Range Partitioning</title>
+        <conbody>
+
+          <p>
+            Range partitioning lets you specify partitioning precisely, based on single values or ranges
+            of values within one or more columns. You add one or more <codeph>RANGE</codeph> clauses to the
+            <codeph>CREATE TABLE</codeph> statement, following the <codeph>PARTITION BY</codeph>
+            clause.
+          </p>
+
+          <p>
+            Range-partitioned Kudu tables use one or more range clauses, which include a
+            combination of constant expressions, <codeph>VALUE</codeph> or <codeph>VALUES</codeph>
+            keywords, and comparison operators. (This syntax replaces the <codeph>SPLIT
+            ROWS</codeph> clause used with early Kudu versions.)
+            For the full syntax, see <xref keyref="create_table"/>.
+          </p>
+
+<codeblock><![CDATA[
+-- 50 buckets, all for IDs beginning with a lowercase letter.
+-- Having only a single range enforces the allowed range of values
+-- but does not add any extra parallelism.
+create table million_rows_one_range (id string primary key, s string)
+  partition by hash(id) partitions 50,
+  range (partition 'a' <= values < '{')
+  stored as kudu;
+
+-- 50 buckets for IDs beginning with a lowercase letter
+-- plus 50 buckets for IDs beginning with an uppercase letter.
+-- Total number of buckets = number in the PARTITIONS clause x number of ranges.
+-- We are still enforcing constraints on the primary key values
+-- allowed in the table, and the 2 ranges provide better parallelism
+-- as rows are inserted or the table is scanned.
+create table million_rows_two_ranges (id string primary key, s string)
+  partition by hash(id) partitions 50,
+  range (partition 'a' <= values < '{', partition 'A' <= values < '[')
+  stored as kudu;
+
+-- Same as previous table, with an extra range covering the single key value '00000'.
+create table million_rows_three_ranges (id string primary key, s string)
+  partition by hash(id) partitions 50,
+  range (partition 'a' <= values < '{', partition 'A' <= values < '[', partition value = '00000')
+  stored as kudu;
+
+-- The range partitioning can be displayed with a SHOW command in impala-shell.
+show range partitions million_rows_three_ranges;
++---------------------+
+| RANGE (id)          |
++---------------------+
+| VALUE = "00000"     |
+| "A" <= VALUES < "[" |
+| "a" <= VALUES < "{" |
++---------------------+
+]]>
+</codeblock>
+
+          <note>
+            <p>
+              When defining ranges, be careful to avoid <q>fencepost errors</q> where values at the
+              extreme ends might be included or omitted by accident. For example, in the tables defined
+              in the preceding code listings, the range <codeph><![CDATA["a" <= VALUES < "{"]]></codeph> ensures that
+              any values starting with <codeph>z</codeph>, such as <codeph>za</codeph> or <codeph>zzz</codeph>
+              or <codeph>zzz-ZZZ</codeph>, are all included, by using a less-than operator for the smallest
+              value after all the values starting with <codeph>z</codeph>.
+            </p>
+          </note>
+
+          <p>
+            For range-partitioned Kudu tables, an appropriate range must exist before a data value can be created in the table.
+            Any <codeph>INSERT</codeph>, <codeph>UPDATE</codeph>, or <codeph>UPSERT</codeph> statements fail if they try to
+            create column values that fall outside the specified ranges. The error checking for ranges is performed on the
+            Kudu side; Impala passes the specified range information to Kudu, and passes back any error or warning if the
+            ranges are not valid. (A nonsensical range specification causes an error for a DDL statement, but only a warning
+            for a DML statement.)
+          </p>
+
+          <p>
+            Ranges can be non-contiguous:
+          </p>
+
+<codeblock><![CDATA[
+partition by range (year) (partition 1885 <= values <= 1889, partition 1893 <= values <= 1897)
+
+partition by range (letter_grade) (partition value = 'A', partition value = 'B',
+  partition value = 'C', partition value = 'D', partition value = 'F')
+]]>
+</codeblock>
+
+          <p>
+            The <codeph>ALTER TABLE</codeph> statement with the <codeph>ADD PARTITION</codeph> or
+            <codeph>DROP PARTITION</codeph> clauses can be used to add or remove ranges from an
+            existing Kudu table.
+          </p>
+
+<codeblock><![CDATA[
+ALTER TABLE foo ADD PARTITION 30 <= VALUES < 50;
+ALTER TABLE foo DROP PARTITION 1 <= VALUES < 5;
+]]>
+</codeblock>
+
+          <p>
+            When a range is added, the new range must not overlap with any of the previous ranges;
+            that is, it can only fill in gaps within the previous ranges.
+          </p>
+
+<codeblock><![CDATA[
+alter table test_scores add range partition value = 'E';
+
+alter table year_ranges add range partition 1890 <= values < 1893;
+]]>
+</codeblock>
+
+          <p>
+            When a range is removed, all the associated rows in the table are deleted. (This
+            is true whether the table is internal or external.)
+          </p>
+
+<codeblock><![CDATA[
+alter table test_scores drop range partition value = 'E';
+
+alter table year_ranges drop range partition 1890 <= values < 1893;
+]]>
+</codeblock>
+
+        <p>
+          Kudu tables can also use a combination of hash and range partitioning.
+        </p>
+
+<codeblock><![CDATA[
+partition by hash (school) partitions 10,
+  range (letter_grade) (partition value = 'A', partition value = 'B',
+    partition value = 'C', partition value = 'D', partition value = 'F')
+]]>
+</codeblock>
+
+        </conbody>
+      </concept>
+
+      <concept id="kudu_partitioning_misc">
+        <title>Working with Partitioning in Kudu Tables</title>
+        <conbody>
+
+          <p>
+            To see the current partitioning scheme for a Kudu table, you can use the <codeph>SHOW
+            CREATE TABLE</codeph> statement or the <codeph>SHOW PARTITIONS</codeph> statement. The
+            <codeph>CREATE TABLE</codeph> syntax displayed by this statement includes all the
+            hash, range, or both clauses that reflect the original table structure plus any
+            subsequent <codeph>ALTER TABLE</codeph> statements that changed the table structure.
+          </p>
+
+          <p>
+            To see the underlying buckets and partitions for a Kudu table, use the
+            <codeph>SHOW TABLE STATS</codeph> or <codeph>SHOW PARTITIONS</codeph> statement.
+          </p>
+
+        </conbody>
+      </concept>
+
+    </concept>
+
+    <concept id="kudu_timestamps">
+
+      <title>Handling Date, Time, or Timestamp Data with Kudu</title>
+
+      <conbody>
+
+        <p>
+          Because currently a Kudu table cannot have a column with the Impala
+          <codeph>TIMESTAMP</codeph> type, expect to store date/time information as the number
+          of seconds, milliseconds, or microseconds since the Unix epoch date of January 1,
+          1970. Specify the column as <codeph>BIGINT</codeph> in the Impala <codeph>CREATE
+          TABLE</codeph> statement, corresponding to an 8-byte integer (an
+          <codeph>int64</codeph>) in the underlying Kudu table). Then use Impala date/time
+          conversion functions as necessary to produce a numeric, <codeph>TIMESTAMP</codeph>,
+          or <codeph>STRING</codeph> value depending on the context.
+        </p>
+
+        <p>
+          For example, the <codeph>unix_timestamp()</codeph> function returns an integer result
+          representing the number of seconds past the epoch. The <codeph>now()</codeph> function
+          produces a <codeph>TIMESTAMP</codeph> representing the current date and time, which can
+          be passed as an argument to <codeph>unix_timestamp()</codeph>. And string literals
+          representing dates and date/times can be cast to <codeph>TIMESTAMP</codeph>, and from there
+          converted to numeric values. The following examples show how you might store a date/time
+          column as <codeph>BIGINT</codeph> in a Kudu table, but still use string literals and
+          <codeph>TIMESTAMP</codeph> values for convenience.
+        </p>
+
+<codeblock><![CDATA[
+-- now() returns a TIMESTAMP and shows the format for string literals you can cast to TIMESTAMP.
+select now();
++-------------------------------+
+| now()                         |
++-------------------------------+
+| 2017-01-25 23:50:10.132385000 |
++-------------------------------+
+
+-- unix_timestamp() accepts either a TIMESTAMP or an equivalent string literal.
+select unix_timestamp(now());
++------------------+
+| unix_timestamp() |
++------------------+
+| 1485386670       |
++------------------+
+
+select unix_timestamp('2017-01-01');
++------------------------------+
+| unix_timestamp('2017-01-01') |
++------------------------------+
+| 1483228800                   |
++------------------------------+
+
+-- Make a table representing a date/time value as BIGINT.
+-- Construct 1 range partition and 20 associated hash partitions for each year.
+-- Use date/time conversion functions to express the ranges as human-readable dates.
+create table time_series(id bigint, when_exactly bigint, event string, primary key (id, when_exactly))
+	partition by hash (id) partitions 20,
+	range (when_exactly)
+	(
+		partition unix_timestamp('2015-01-01') <= values < unix_timestamp('2016-01-01'),
+		partition unix_timestamp('2016-01-01') <= values < unix_timestamp('2017-01-01'),
+		partition unix_timestamp('2017-01-01') <= values < unix_timestamp('2018-01-01')
+	)
+	stored as kudu;
+
+-- On insert, we can transform a human-readable date/time into a numeric value.
+insert into time_series values (12345, unix_timestamp('2017-01-25 23:24:56'), 'Working on doc examples');
+
+-- On retrieval, we can examine the numeric date/time value or turn it back into a string for readability.
+select id, when_exactly, from_unixtime(when_exactly) as 'human-readable date/time', event
+  from time_series order by when_exactly limit 100;
++-------+--------------+--------------------------+-------------------------+
+| id    | when_exactly | human-readable date/time | event                   |
++-------+--------------+--------------------------+-------------------------+
+| 12345 | 1485386696   | 2017-01-25 23:24:56      | Working on doc examples |
++-------+--------------+--------------------------+-------------------------+
+]]>
+</codeblock>
+
+        <note>
+          <p>
+            If you do high-precision arithmetic involving numeric date/time values,
+            when dividing millisecond values by 1000, or microsecond values by 1 million, always
+            cast the integer numerator to a <codeph>DECIMAL</codeph> with sufficient precision
+            and scale to avoid any rounding or loss of precision.
+          </p>
+        </note>
+
+<codeblock><![CDATA[
+-- 1 million and 1 microseconds = 1.000001 seconds.
+select microseconds,
+  cast (microseconds as decimal(20,7)) / 1e6 as fractional_seconds
+  from table_with_microsecond_column;
++--------------+----------------------+
+| microseconds | fractional_seconds   |
++--------------+----------------------+
+| 1000001      | 1.000001000000000000 |
++--------------+----------------------+
+]]>
+</codeblock>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="kudu_metadata">
+
+      <title>How Impala Handles Kudu Metadata</title>
+
+      <conbody>
+
+        <p conref="../shared/impala_common.xml#common/kudu_metadata_intro"/>
+        <p conref="../shared/impala_common.xml#common/kudu_metadata_details"/>
+
+        <p>
+          Because Kudu manages the metadata for its own tables separately from the metastore
+          database, there is a table name stored in the metastore database for Impala to use,
+          and a table name on the Kudu side, and these names can be modified independently
+          through <codeph>ALTER TABLE</codeph> statements.
+        </p>
+
+        <p>
+          To avoid potential name conflicts, the prefix <codeph>impala::</codeph>
+          and the Impala database name are encoded into the underlying Kudu
+          table name:
+        </p>
+
+<codeblock><![CDATA[
+create database some_database;
+use some_database;
+
+create table table_name_demo (x int primary key, y int)
+  partition by hash (x) partitions 2 stored as kudu;
+
+describe formatted table_name_demo;
+...
+kudu.table_name  | impala::some_database.table_name_demo
+]]>
+</codeblock>
+
+        <p>
+          See <xref keyref="kudu_tables"/> for examples of how to change the name of
+          the Impala table in the metastore database, the name of the underlying Kudu
+          table, or both.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+  </concept>
+
+  <concept id="kudu_etl">
+
+    <title>Loading Data into Kudu Tables</title>
+
+    <conbody>
+
+      <p>
+        Kudu tables are well-suited to use cases where data arrives continuously, in small or
+        moderate volumes. To bring data into Kudu tables, use the Impala <codeph>INSERT</codeph>
+        and <codeph>UPSERT</codeph> statements. The <codeph>LOAD DATA</codeph> statement does
+        not apply to Kudu tables.
       </p>
 
       <p>
-        Kudu can do extra optimizations for queries that refer to the primary key columns in the
-        <codeph>WHERE</codeph> clause. It is not crucial though to include the primary key columns in the
-        <codeph>WHERE</codeph> clause of every query. The benefit is mainly for partitioned tables,
-        which divide the data among various tablet servers based on the distribution of
-        data values in some or all of the primary key columns.
+        Because Kudu manages its own storage layer that is optimized for smaller block sizes than
+        HDFS, and performs its own housekeeping to keep data evenly distributed, it is not
+        subject to the <q>many small files</q> issue and does not need explicit reorganization
+        and compaction as the data grows over time. The partitions within a Kudu table can be
+        specified to cover a variety of possible data distributions, instead of hardcoding a new
+        partition for each new day, hour, and so on, which can lead to inefficient,
+        hard-to-scale, and hard-to-manage partition schemes with HDFS tables.
+      </p>
+
+      <p>
+        Your strategy for performing ETL or bulk updates on Kudu tables should take into account
+        the limitations on consistency for DML operations.
+      </p>
+
+      <p>
+        Make <codeph>INSERT</codeph>, <codeph>UPDATE</codeph>, and <codeph>UPSERT</codeph>
+        operations <term>idempotent</term>: that is, able to be applied multiple times and still
+        produce an identical result.
+      </p>
+
+      <p>
+        If a bulk operation is in danger of exceeding capacity limits due to timeouts or high
+        memory usage, split it into a series of smaller operations.
+      </p>
+
+      <p>
+        Avoid running concurrent ETL operations where the end results depend on precise
+        ordering. In particular, do not rely on an <codeph>INSERT ... SELECT</codeph> statement
+        that selects from the same table into which it is inserting, unless you include extra
+        conditions in the <codeph>WHERE</codeph> clause to avoid reading the newly inserted rows
+        within the same statement.
+      </p>
+
+      <p>
+        Because relationships between tables cannot be enforced by Impala and Kudu, and cannot
+        be committed or rolled back together, do not expect transactional semantics for
+        multi-table operations.
       </p>
 
     </conbody>
@@ -110,52 +1168,149 @@ under the License.
 
   <concept id="kudu_dml">
 
-    <title>Impala DML Support for Kudu Tables</title>
+    <title>Impala DML Support for Kudu Tables (INSERT, UPDATE, DELETE, UPSERT)</title>
+
+    <prolog>
+      <metadata>
+        <data name="Category" value="DML"/>
+      </metadata>
+    </prolog>
+
+    <conbody>
+
+      <p>
+        Impala supports certain DML statements for Kudu tables only. The <codeph>UPDATE</codeph>
+        and <codeph>DELETE</codeph> statements let you modify data within Kudu tables without
+        rewriting substantial amounts of table data. The <codeph>UPSERT</codeph> statement acts
+        as a combination of <codeph>INSERT</codeph> and <codeph>UPDATE</codeph>, inserting rows
+        where the primary key does not already exist, and updating the non-primary key columns
+        where the primary key does already exist in the table.
+      </p>
+
+      <p>
+        The <codeph>INSERT</codeph> statement for Kudu tables honors the unique and <codeph>NOT
+        NULL</codeph> requirements for the primary key columns.
+      </p>
+
+      <p>
+        Because Impala and Kudu do not support transactions, the effects of any
+        <codeph>INSERT</codeph>, <codeph>UPDATE</codeph>, or <codeph>DELETE</codeph> statement
+        are immediately visible. For example, you cannot do a sequence of
+        <codeph>UPDATE</codeph> statements and only make the changes visible after all the
+        statements are finished. Also, if a DML statement fails partway through, any rows that
+        were already inserted, deleted, or changed remain in the table; there is no rollback
+        mechanism to undo the changes.
+      </p>
+
+      <p>
+        In particular, an <codeph>INSERT ... SELECT</codeph> statement that refers to the table
+        being inserted into might insert more rows than expected, because the
+        <codeph>SELECT</codeph> part of the statement sees some of the new rows being inserted
+        and processes them again.
+      </p>
+
+      <note>
+        <p>
+          The <codeph>LOAD DATA</codeph> statement, which involves manipulation of HDFS data files,
+          does not apply to Kudu tables.
+        </p>
+      </note>
+
+    </conbody>
+
+  </concept>
+
+  <concept id="kudu_consistency">
+
+    <title>Consistency Considerations for Kudu Tables</title>
 
     <conbody>
 
       <p>
-        Impala supports certain DML statements for Kudu tables only. The <codeph>UPDATE</codeph> and
-        <codeph>DELETE</codeph> statements let you modify data within Kudu tables without rewriting substantial
-        amounts of table data.
+        Kudu tables have consistency characteristics such as uniqueness, controlled by the
+        primary key columns, and non-nullable columns. The emphasis for consistency is on
+        preventing duplicate or incomplete data from being stored in a table.
       </p>
 
       <p>
-        The <codeph>INSERT</codeph> statement for Kudu tables honors the unique and non-<codeph>NULL</codeph>
-        requirements for the primary key columns.
+        Currently, Kudu does not enforce strong consistency for order of operations, total
+        success or total failure of a multi-row statement, or data that is read while a write
+        operation is in progress. Changes are applied atomically to each row, but not applied
+        as a single unit to all rows affected by a multi-row DML statement. That is, Kudu does
+        not currently have atomic multi-row statements or isolation between statements.
       </p>
 
       <p>
-        Because Impala and Kudu do not support transactions, the effects of any <codeph>INSERT</codeph>,
-        <codeph>UPDATE</codeph>, or <codeph>DELETE</codeph> statement are immediately visible. For example, you
-        cannot do a sequence of <codeph>UPDATE</codeph> statements and only make the change visible after all the
-        statements are finished. Also, if a DML statement fails partway through, any rows that were already
-        inserted, deleted, or changed remain in the table; there is no rollback mechanism to undo the changes.
+        If some rows are rejected during a DML operation because of a mismatch with duplicate
+        primary key values, <codeph>NOT NULL</codeph> constraints, and so on, the statement
+        succeeds with a warning. Impala still inserts, deletes, or updates the other rows that
+        are not affected by the constraint violation.
+      </p>
+
+      <p>
+        Consequently, the number of rows affected by a DML operation on a Kudu table might be
+        different than you expect.
+      </p>
+
+      <p>
+        Because there is no strong consistency guarantee for information being inserted into,
+        deleted from, or updated across multiple tables simultaneously, consider denormalizing
+        the data where practical. That is, if you run separate <codeph>INSERT</codeph>
+        statements to insert related rows into two different tables, one <codeph>INSERT</codeph>
+        might fail while the other succeeds, leaving the data in an inconsistent state. Even if
+        both inserts succeed, a join query might happen during the interval between the
+        completion of the first and second statements, and the query would encounter incomplete
+        inconsistent data. Denormalizing the data into a single wide table can reduce the
+        possibility of inconsistency due to multi-table operations.
+      </p>
+
+      <p>
+        Information about the number of rows affected by a DML operation is reported in
+        <cmdname>impala-shell</cmdname> output, and in the <codeph>PROFILE</codeph> output, but
+        is not currently reported to HiveServer2 clients such as JDBC or ODBC applications.
       </p>
 
     </conbody>
 
   </concept>
 
-  <concept id="kudu_partitioning">
+  <concept id="kudu_security">
 
-    <title>Partitioning for Kudu Tables</title>
+    <title>Security Considerations for Kudu Tables</title>
 
     <conbody>
 
       <p>
-        Kudu tables use special mechanisms to evenly distribute data among the underlying tablet servers. Although
-        we refer to such tables as partitioned tables, they are distinguished from traditional Impala partitioned
-        tables by use of different clauses on the <codeph>CREATE TABLE</codeph> statement. Partitioned Kudu tables
-        use <codeph>PARTITION BY</codeph>, <codeph>HASH</codeph>, and <codeph>RANGE</codeph> clauses rather than
-        the traditional <codeph>PARTITIONED BY</codeph> clause. All of the columns involved in these clauses must
-        be primary key columns. These clauses let you specify different ways to divide the data for each column,
-        or even for different value ranges within a column. This flexibility lets you avoid problems with uneven
-        distribution of data, where the partitioning scheme for HDFS tables might result in some partitions being
-        much larger than others. By setting up an effective partitioning scheme for a Kudu table, you can ensure
-        that the work for a query can be parallelized evenly across the hosts in a cluster.
+        Security for Kudu tables involves:
       </p>
 
+      <ul>
+        <li>
+          <p>
+            Sentry authorization.
+          </p>
+          <p conref="../shared/impala_common.xml#common/kudu_sentry_limitations"/>
+        </li>
+
+        <li>
+          <p>
+            Lineage tracking.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Auditing.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Redaction of sensitive information from log files.
+          </p>
+        </li>
+      </ul>
+
     </conbody>
 
   </concept>
@@ -167,18 +1322,66 @@ under the License.
     <conbody>
 
       <p>
-        For queries involving Kudu tables, Impala can delegate much of the work of filtering the result set to
-        Kudu, avoiding some of the I/O involved in full table scans of tables containing HDFS data files. This type
-        of optimization is especially effective for partitioned Kudu tables, where the Impala query
-        <codeph>WHERE</codeph> clause refers to one or more primary key columns that are also used as partition key
-        columns. For example, if a partitioned Kudu table uses a <codeph>HASH</codeph> clause for
-        <codeph>col1</codeph> and a <codeph>RANGE</codeph> clause for <codeph>col2</codeph>, a query using a clause
-        such as <codeph>WHERE col1 IN (1,2,3) AND col2 &gt; 100</codeph> can determine exactly which tablet servers
-        contain relevant data, and therefore parallelize the query very efficiently.
+        For queries involving Kudu tables, Impala can delegate much of the work of filtering the
+        result set to Kudu, avoiding some of the I/O involved in full table scans of tables
+        containing HDFS data files. This type of optimization is especially effective for
+        partitioned Kudu tables, where the Impala query <codeph>WHERE</codeph> clause refers to
+        one or more primary key columns that are also used as partition key columns. For
+        example, if a partitioned Kudu table uses a <codeph>HASH</codeph> clause for
+        <codeph>col1</codeph> and a <codeph>RANGE</codeph> clause for <codeph>col2</codeph>, a
+        query using a clause such as <codeph>WHERE col1 IN (1,2,3) AND col2 &gt; 100</codeph>
+        can determine exactly which tablet servers contain relevant data, and therefore
+        parallelize the query very efficiently.
       </p>
 
+      <p>
+        See <xref keyref="explain"/> for examples of evaluating the effectiveness of
+        the predicate pushdown for a specific query against a Kudu table.
+      </p>
+
+      <!-- Hide until subtopics are ready to display. -->
+      <p outputclass="toc inpage" audience="hidden"/>
+
     </conbody>
 
+    <concept id="kudu_vs_parquet" audience="hidden">
+    <!-- To do: if there is enough real-world experience in future to have a
+         substantive discussion of this subject, revisit this topic and
+         consider unhiding it. -->
+
+      <title>How Kudu Works with Column-Oriented Operations</title>
+
+      <conbody>
+
+        <p>
+          For immutable data, Impala is often used with Parquet tables due to the efficiency of
+          the column-oriented Parquet layout. This section describes how Kudu stores and
+          retrieves columnar data, to help you understand performance and storage considerations
+          of Kudu tables as compared with Parquet tables.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="kudu_memory" audience="hidden">
+    <!-- To do: if there is enough real-world experience in future to have a
+         substantive discussion of this subject, revisit this topic and
+         consider unhiding it. -->
+
+      <title>Memory Usage for Operations on Kudu Tables</title>
+
+      <conbody>
+
+        <p>
+          The Apache Kudu architecture, topology, and data storage techniques result in
+          different patterns of memory usage for Impala statements than with HDFS-backed tables.
+        </p>
+
+      </conbody>
+
+    </concept>
+
   </concept>
 
 </concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_literals.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_literals.xml b/docs/topics/impala_literals.xml
index d0d475c..22404d9 100644
--- a/docs/topics/impala_literals.xml
+++ b/docs/topics/impala_literals.xml
@@ -397,6 +397,24 @@ insert into t1 partition(x=NULL, y) select c1, c3  from some_other_table;</codeb
         <codeph>nullifzero()</codeph>, and <codeph>zeroifnull()</codeph>. See
         <xref href="impala_conditional_functions.xml#conditional_functions"/> for details.
       </p>
+
+      <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+      <p rev="kudu">
+        Columns in Kudu tables have an attribute that specifies whether or not they can contain
+        <codeph>NULL</codeph> values. A column with a <codeph>NULL</codeph> attribute can contain
+        nulls. A column with a <codeph>NOT NULL</codeph> attribute cannot contain any nulls, and
+        an <codeph>INSERT</codeph>, <codeph>UPDATE</codeph>, or <codeph>UPSERT</codeph> statement
+        will skip any row that attempts to store a null in a column designated as <codeph>NOT NULL</codeph>.
+        Kudu tables default to the <codeph>NULL</codeph> setting for each column, except columns that
+        are part of the primary key.
+      </p>
+      <p rev="kudu">
+        In addition to columns with the <codeph>NOT NULL</codeph> attribute, Kudu tables also have
+        restrictions on <codeph>NULL</codeph> values in columns that are part of the primary key for
+        a table. No column that is part of the primary key in a Kudu table can contain any
+        <codeph>NULL</codeph> values.
+      </p>
+
     </conbody>
   </concept>
 </concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_map.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_map.xml b/docs/topics/impala_map.xml
index 3d03129..6fb697b 100644
--- a/docs/topics/impala_map.xml
+++ b/docs/topics/impala_map.xml
@@ -85,6 +85,9 @@ type ::= <varname>primitive_type</varname> | <varname>complex_type</varname>
         <li/>
       </ul>
 
+      <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+      <p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
+
       <p conref="../shared/impala_common.xml#common/example_blurb"/>
 
       <note conref="../shared/impala_common.xml#common/complex_type_schema_pointer"/>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_partitioning.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_partitioning.xml b/docs/topics/impala_partitioning.xml
index 4d723ad..1f70d2b 100644
--- a/docs/topics/impala_partitioning.xml
+++ b/docs/topics/impala_partitioning.xml
@@ -575,7 +575,7 @@ SELECT COUNT(*) FROM sales_table WHERE year IN (2005, 2010, 2015);
 
   </concept>
 
-  <concept rev="kudu" id="partition_kudu" audience="hidden">
+  <concept rev="kudu 2.8.0" id="partition_kudu">
 
     <title>Using Partitioning with Kudu Tables</title>
 
@@ -593,6 +593,12 @@ SELECT COUNT(*) FROM sales_table WHERE year IN (2005, 2010, 2015);
         columns.
       </p>
 
+      <p>
+        See <xref href="impala_kudu.xml#kudu_partitioning"/> for
+        details and examples of the partitioning techniques
+        for Kudu tables.
+      </p>
+
     </conbody>
 
   </concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_refresh.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_refresh.xml b/docs/topics/impala_refresh.xml
index 8244aa4..7897ecd 100644
--- a/docs/topics/impala_refresh.xml
+++ b/docs/topics/impala_refresh.xml
@@ -333,6 +333,11 @@ ERROR: AnalysisException: Items in partition spec must exactly match the partiti
     <p conref="../shared/impala_common.xml#common/s3_metadata"/>
 
     <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+
+    <p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
+    <p conref="../shared/impala_common.xml#common/kudu_metadata_intro"/>
+    <p conref="../shared/impala_common.xml#common/kudu_metadata_details"/>
+
     <p conref="../shared/impala_common.xml#common/related_info"/>
     <p>
       <xref href="impala_hadoop.xml#intro_metastore"/>,

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_reserved_words.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_reserved_words.xml b/docs/topics/impala_reserved_words.xml
index 423fd43..440120c 100644
--- a/docs/topics/impala_reserved_words.xml
+++ b/docs/topics/impala_reserved_words.xml
@@ -82,7 +82,9 @@ avro
 between
 bigint
 <ph rev="1.4.0">binary</ph>
+<ph rev="kudu">blocksize</ph>
 boolean
+<!-- <ph rev="kudu">buckets</ph> -->
 by
 <ph rev="1.4.0">cached</ph>
 <ph rev="2.3.0">cascade</ph>
@@ -95,6 +97,7 @@ change
 column
 columns
 comment
+<ph rev="kudu">compression</ph>
 compute
 create
 cross
@@ -105,15 +108,18 @@ databases
 date
 datetime
 decimal
-<ph rev="2.6.0">delete</ph>
+<ph rev="kudu">default</ph>
+<ph rev="kudu">delete</ph>
 delimited
 desc
 describe
 distinct
+<!-- <ph rev="kudu">distribute</ph> -->
 div
 double
 drop
 else
+<ph rev="kudu">encoding</ph>
 end
 escaped
 exists
@@ -136,10 +142,10 @@ function
 functions
 <ph rev="2.1.0">grant</ph>
 group
-<ph rev="2.6.0">hash</ph>
+<ph rev="kudu">hash</ph>
 having
 if
-<ph rev="2.6.0">ignore</ph>
+<!-- <ph rev="kudu">ignore</ph> -->
 <ph rev="2.5.0">ilike</ph>
 in
 <ph rev="2.1.0">incremental</ph>
@@ -210,6 +216,7 @@ serdeproperties
 set
 show
 smallint
+<!-- <ph rev="kudu">split</ph> -->
 stats
 stored
 straight_join
@@ -229,8 +236,9 @@ true
 <ph rev="2.0.0">unbounded</ph>
 <ph rev="1.4.0">uncached</ph>
 union
-<ph rev="2.6.0">update</ph>
+<ph rev="kudu">update</ph>
 <ph rev="1.2.1">update_fn</ph>
+<ph rev="kudu">upsert</ph>
 use
 using
 values

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_revoke.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_revoke.xml b/docs/topics/impala_revoke.xml
index 97a912d..b7a0908 100644
--- a/docs/topics/impala_revoke.xml
+++ b/docs/topics/impala_revoke.xml
@@ -108,6 +108,9 @@ object_type ::= TABLE | DATABASE | SERVER | URI
 
     <p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
 
+    <p rev="2.8.0" conref="../shared/impala_common.xml#common/kudu_blurb"/>
+    <p conref="../shared/impala_common.xml#common/kudu_sentry_limitations"/>
+
     <p conref="../shared/impala_common.xml#common/related_info"/>
 
     <p>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_show.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_show.xml b/docs/topics/impala_show.xml
index 84e9c0b..4e9e7fc 100644
--- a/docs/topics/impala_show.xml
+++ b/docs/topics/impala_show.xml
@@ -28,6 +28,7 @@ under the License.
       <data name="Category" value="Developers"/>
       <data name="Category" value="Data Analysts"/>
       <data name="Category" value="Reports"/>
+      <data name="Category" value="Kudu"/>
     </metadata>
   </prolog>
 
@@ -49,7 +50,8 @@ SHOW TABLES [IN <varname>database_name</varname>] [[LIKE] '<varname>pattern</var
 <ph rev="1.2.1">SHOW TABLE STATS [<varname>database_name</varname>.]<varname>table_name</varname></ph>
 <ph rev="1.2.1">SHOW COLUMN STATS [<varname>database_name</varname>.]<varname>table_name</varname></ph>
 <ph rev="1.4.0">SHOW PARTITIONS [<varname>database_name</varname>.]<varname>table_name</varname></ph>
-SHOW FILES IN [<varname>database_name</varname>.]<varname>table_name</varname> <ph  rev="IMPALA-1654">[PARTITION (<varname>key_col_expression</varname> [, <varname>key_col_expression</varname>]</ph>]
+<ph rev="1.4.0">SHOW <ph rev="kudu">[RANGE]</ph> PARTITIONS [<varname>database_name</varname>.]<varname>table_name</varname></ph>
+SHOW FILES IN [<varname>database_name</varname>.]<varname>table_name</varname> <ph rev="IMPALA-1654">[PARTITION (<varname>key_col_expression</varname> [, <varname>key_col_expression</varname>]</ph>]
 
 <ph rev="2.0.0">SHOW ROLES
 SHOW CURRENT ROLES
@@ -129,7 +131,8 @@ show files in sample_table partition (month like 'J%');
       <note>
         This statement applies to tables and partitions stored on HDFS, or in the Amazon Simple Storage System (S3).
         It does not apply to views.
-        It does not apply to tables mapped onto HBase, because HBase does not use the same file-based storage layout.
+        It does not apply to tables mapped onto HBase <ph rev="kudu">or Kudu</ph>,
+        because those data management systems do not use the same file-based storage layout.
       </note>
 
       <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
@@ -742,6 +745,61 @@ show tables like '*dim*|t*';
 
       <p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
 
+      <p rev="kudu">
+        For Kudu tables:
+      </p>
+
+      <ul rev="kudu">
+        <li>
+          <p>
+            The column specifications include attributes such as <codeph>NULL</codeph>,
+            <codeph>NOT NULL</codeph>, <codeph>ENCODING</codeph>, and <codeph>COMPRESSION</codeph>.
+            If you do not specify those attributes in the original <codeph>CREATE TABLE</codeph> statement,
+            the <codeph>SHOW CREATE TABLE</codeph> output displays the defaults that were used.
+          </p>
+        </li>
+        <li>
+          <p>
+            The specifications of any <codeph>RANGE</codeph> clauses are not displayed in full.
+            To see the definition of the range clauses for a Kudu table, use the <codeph>SHOW RANGE PARTITIONS</codeph> statement.
+          </p>
+        </li>
+        <li>
+          <p>
+            The <codeph>TBLPROPERTIES</codeph> output reflects the Kudu master address
+            and the internal Kudu name associated with the Impala table.
+          </p>
+        </li>
+      </ul>
+
+<codeblock rev="kudu">
+show CREATE TABLE numeric_grades_default_letter;
++------------------------------------------------------------------------------------------------+
+| result                                                                                         |
++------------------------------------------------------------------------------------------------+
+| CREATE TABLE user.numeric_grades_default_letter (                                              |
+|   score TINYINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,               |
+|   letter_grade STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION DEFAULT '-', |
+|   student STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,                  |
+|   PRIMARY KEY (score)                                                                          |
+| )                                                                                              |
+| PARTITION BY <b>RANGE (score) (...)</b>                                                               |
+| STORED AS KUDU                                                                                 |
+| TBLPROPERTIES ('kudu.master_addresses'='vd0342.example.com:7051',                              |
+|   'kudu.table_name'='impala::USER.numeric_grades_default_letter')                              |
++------------------------------------------------------------------------------------------------+
+
+show range partitions numeric_grades_default_letter;
++--------------------+
+| RANGE (score)      |
++--------------------+
+| 0 &lt;= VALUES &lt; 50   |
+| 50 &lt;= VALUES &lt; 65  |
+| 65 &lt;= VALUES &lt; 80  |
+| 80 &lt;= VALUES &lt; 100 |
++--------------------+
+</codeblock>
+
       <p conref="../shared/impala_common.xml#common/example_blurb"/>
 
       <p>
@@ -855,6 +913,39 @@ show create table show_create_table_demo;
 
       <p conref="../shared/impala_common.xml#common/show_security"/>
 
+      <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+
+      <p rev="kudu IMPALA-2830">
+        Because Kudu tables do not have characteristics derived from HDFS, such
+        as number of files, file format, and HDFS cache status, the output of
+        <codeph>SHOW TABLE STATS</codeph> reflects different characteristics
+        that apply to Kudu tables. If the Kudu table is created with the
+        clause <codeph>PARTITIONS 20</codeph>, then the result set of
+        <codeph>SHOW TABLE STATS</codeph> consists of 20 rows, each representing
+        one of the numbered partitions. For example:
+      </p>
+
+<codeblock rev="kudu IMPALA-2830">
+show table stats kudu_table;
++--------+-----------+----------+-----------------------+------------+
+| # Rows | Start Key | Stop Key | Leader Replica        | # Replicas |
++--------+-----------+----------+-----------------------+------------+
+| -1     |           | 00000001 | host.example.com:7050 | 3          |
+| -1     | 00000001  | 00000002 | host.example.com:7050 | 3          |
+| -1     | 00000002  | 00000003 | host.example.com:7050 | 3          |
+| -1     | 00000003  | 00000004 | host.example.com:7050 | 3          |
+| -1     | 00000004  | 00000005 | host.example.com:7050 | 3          |
+...
+</codeblock>
+
+      <p rev="IMPALA-2830">
+        Impala does not compute the number of rows for each partition for
+        Kudu tables. Therefore, you do not need to re-run <codeph>COMPUTE STATS</codeph>
+        when you see -1 in the <codeph># Rows</codeph> column of the output from
+        <codeph>SHOW TABLE STATS</codeph>. That column always shows -1 for
+        all Kudu tables. 
+      </p>
+
       <p conref="../shared/impala_common.xml#common/example_blurb"/>
 
       <p>
@@ -959,6 +1050,14 @@ show table stats store_sales;
 
       <p conref="../shared/impala_common.xml#common/show_security"/>
 
+      <p rev="kudu IMPALA-2830">
+        The output for <codeph>SHOW COLUMN STATS</codeph> includes
+        the relevant information for Kudu tables.
+        The information for column statistics that originates in the
+        underlying Kudu storage layer is also represented in the
+        metastore database that Impala uses.
+      </p>
+
       <p conref="../shared/impala_common.xml#common/example_blurb"/>
 
       <p>
@@ -1145,8 +1244,31 @@ show column stats store_sales;
 
       <p conref="../shared/impala_common.xml#common/show_security"/>
 
+      <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+
+      <p rev="kudu IMPALA-4403">
+        The optional <codeph>RANGE</codeph> clause only applies to Kudu tables. It displays only the partitions
+        defined by the <codeph>RANGE</codeph> clause of <codeph>CREATE TABLE</codeph> or <codeph>ALTER TABLE</codeph>.
+      </p>
+
+      <p rev="kudu IMPALA-4403">
+        Although you can specify <codeph>&lt;</codeph> or
+        <codeph>&lt;=</codeph> comparison operators when defining
+        range partitions for Kudu tables, Kudu rewrites them if necessary
+        to represent each range as
+        <codeph><varname>low_bound</varname> &lt;= VALUES &lt; <varname>high_bound</varname></codeph>.
+        This rewriting might involve incrementing one of the boundary values
+        or appending a <codeph>\0</codeph> for string values, so that the
+        partition covers the same range as originally specified.
+      </p>
+
       <p conref="../shared/impala_common.xml#common/example_blurb"/>
 
+      <p>
+        The following example shows the output for a Parquet, text, or other
+        HDFS-backed table partitioned on the <codeph>YEAR</codeph> column:
+      </p>
+
 <codeblock rev="1.4.0">[localhost:21000] &gt; show partitions census;
 +-------+-------+--------+------+---------+
 | year  | #Rows | #Files | Size | Format  |
@@ -1162,6 +1284,53 @@ show column stats store_sales;
 +-------+-------+--------+------+---------+
 </codeblock>
 
+      <p rev="kudu IMPALA-4403">
+        The following example shows the output for a Kudu table
+        using the hash partitioning mechanism. The number of
+        rows in the result set corresponds to the values used
+        in the <codeph>PARTITIONS <varname>N</varname></codeph>
+        clause of <codeph>CREATE TABLE</codeph>.
+      </p>
+
+<codeblock rev="kudu IMPALA-4403"><![CDATA[
+show partitions million_rows_hash;
+
++--------+-----------+----------+-----------------------+--
+| # Rows | Start Key | Stop Key | Leader Replica        | # Replicas
++--------+-----------+----------+-----------------------+--
+| -1     |           | 00000001 | n236.example.com:7050 | 3
+| -1     | 00000001  | 00000002 | n236.example.com:7050 | 3
+| -1     | 00000002  | 00000003 | n336.example.com:7050 | 3
+| -1     | 00000003  | 00000004 | n238.example.com:7050 | 3
+| -1     | 00000004  | 00000005 | n338.example.com:7050 | 3
+....
+| -1     | 0000002E  | 0000002F | n240.example.com:7050 | 3
+| -1     | 0000002F  | 00000030 | n336.example.com:7050 | 3
+| -1     | 00000030  | 00000031 | n240.example.com:7050 | 3
+| -1     | 00000031  |          | n334.example.com:7050 | 3
++--------+-----------+----------+-----------------------+--
+Fetched 50 row(s) in 0.05s
+]]>
+</codeblock>
+
+      <p rev="kudu IMPALA-4403">
+        The following example shows the output for a Kudu table
+        using the range partitioning mechanism:
+      </p>
+
+<codeblock rev="kudu IMPALA-4403"><![CDATA[
+show range partitions million_rows_range;
++-----------------------+
+| RANGE (id)            |
++-----------------------+
+| VALUES < "A"          |
+| "A" <= VALUES < "["   |
+| "a" <= VALUES < "{"   |
+| "{" <= VALUES < "~\0" |
++-----------------------+
+]]>
+</codeblock>
+
       <p conref="../shared/impala_common.xml#common/permissions_blurb"/>
       <p rev="CDH-19187">
         The user ID that the <cmdname>impalad</cmdname> daemon runs under,

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_struct.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_struct.xml b/docs/topics/impala_struct.xml
index d103d95..b4289ab 100644
--- a/docs/topics/impala_struct.xml
+++ b/docs/topics/impala_struct.xml
@@ -112,6 +112,9 @@ type ::= <varname>primitive_type</varname> | <varname>complex_type</varname>
       <li/>
     </ul>
 
+    <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+    <p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
+
     <p conref="../shared/impala_common.xml#common/example_blurb"/>
 
     <note conref="../shared/impala_common.xml#common/complex_type_schema_pointer"/>

[8/9] incubator-impala git commit: IMPALA-4854: Fix incremental stats with complex types.

Posted by mj...@apache.org.

IMPALA-4854: Fix incremental stats with complex types.

The bug: Compute incremental stats used to always do a
full stats recomputation for tables with complex types.
The logic for detecting schema changes (e.g. an added
column) did not take into consideration that columns
with complex types are ignored in the stats computation,
and should therefore not be recognized as a new column
that does not yet have stats.

Testing:
- Added a new regression test
- Locally ran test_compute_stats.py and the FE tests

Change-Id: I6e0335048d688ee25ff55c6628d0f6f8ecc1dd8a
Reviewed-on: http://gerrit.cloudera.org:8080/6033
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/d845413a
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/d845413a
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/d845413a

Branch: refs/heads/master
Commit: d845413ab8fb0c92fc2d8d0c2a54d0de4dbd7429
Parents: 0c87152
Author: Alex Behm <al...@cloudera.com>
Authored: Wed Feb 15 19:03:47 2017 -0800
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Fri Feb 17 06:02:48 2017 +0000

----------------------------------------------------------------------
 .../impala/analysis/ComputeStatsStmt.java       | 24 +++++++++------
 .../QueryTest/compute-stats-incremental.test    | 32 ++++++++++++++++++++
 2 files changed, 47 insertions(+), 9 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/d845413a/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java b/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
index 90c46a8..a4552a6 100644
--- a/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
+++ b/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
@@ -163,14 +163,8 @@ public class ComputeStatsStmt extends StatementBase {
 
     for (int i = startColIdx; i < table_.getColumns().size(); ++i) {
       Column c = table_.getColumns().get(i);
-      Type type = c.getType();
+      if (ignoreColumn(c)) continue;
 
-      // Ignore columns with an invalid/unsupported type. For example, complex types in
-      // an HBase-backed table will appear as invalid types.
-      if (!type.isValid() || !type.isSupported()
-          || c.getType().isComplexType()) {
-        continue;
-      }
       // NDV approximation function. Add explicit alias for later identification when
       // updating the Metastore.
       String colRefSql = ToSqlUtils.getIdentSql(c.getName());
@@ -189,6 +183,7 @@ public class ComputeStatsStmt extends StatementBase {
       }
 
       // For STRING columns also compute the max and avg string length.
+      Type type = c.getType();
       if (type.isStringType()) {
         columnStatsSelectList.add("MAX(length(" + colRefSql + "))");
         columnStatsSelectList.add("AVG(length(" + colRefSql + "))");
@@ -313,12 +308,13 @@ public class ComputeStatsStmt extends StatementBase {
         boolean tableIsMissingColStats = false;
 
         // We'll warn the user if a column is missing stats (and therefore we rescan the
-        // whole table), but if all columns are missing stats, the table just doesn't have
-        // any stats and there's no need to warn.
+        // whole table), but if all columns are missing stats, the table just doesn't
+        // have any stats and there's no need to warn.
         boolean allColumnsMissingStats = true;
         String exampleColumnMissingStats = null;
         // Partition columns always have stats, so exclude them from this search
         for (Column col: table_.getNonClusteringColumns()) {
+          if (ignoreColumn(col)) continue;
           if (!col.getStats().hasStats()) {
             if (!tableIsMissingColStats) {
               tableIsMissingColStats = true;
@@ -527,6 +523,16 @@ public class ComputeStatsStmt extends StatementBase {
     }
   }
 
+  /**
+   * Returns true if the given column should be ignored for the purpose of computing
+   * column stats. Columns with an invalid/unsupported/complex type are ignored.
+   * For example, complex types in an HBase-backed table will appear as invalid types.
+   */
+  private boolean ignoreColumn(Column c) {
+    Type t = c.getType();
+    return !t.isValid() || !t.isSupported() || t.isComplexType();
+  }
+
   public String getTblStatsQuery() { return tableStatsQueryStr_; }
   public String getColStatsQuery() { return columnStatsQueryStr_; }
 

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/d845413a/testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test
----------------------------------------------------------------------
diff --git a/testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test b/testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test
index 5f98ee7..8e89956 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test
@@ -527,3 +527,35 @@ COLUMN, TYPE, #DISTINCT VALUES, #NULLS, MAX SIZE, AVG SIZE
 ---- TYPES
 STRING, STRING, BIGINT, BIGINT, INT, DOUBLE
 ====
+---- QUERY
+# IMPALA-4854: Tests incremental computation in the presence of complex-typed columns.
+create external table complextypestbl_part
+  like parquet '$FILESYSTEM_PREFIX/test-warehouse/complextypestbl_parquet/nullable.parq'
+  partitioned by (p int) stored as parquet;
+alter table complextypestbl_part add partition (p=0)
+  location '$FILESYSTEM_PREFIX/test-warehouse/complextypestbl_parquet/';
+alter table complextypestbl_part add partition (p=1)
+  location '$FILESYSTEM_PREFIX/test-warehouse/complextypestbl_parquet/';
+compute incremental stats complextypestbl_part;
+---- RESULTS
+'Updated 2 partition(s) and 1 column(s).'
+---- TYPES
+STRING
+====
+---- QUERY
+# The table was not changed. Validate that the next compute incremental stats is a no-op.
+compute incremental stats complextypestbl_part;
+---- RESULTS
+---- ERRORS
+No partitions selected for incremental stats update
+====
+---- QUERY
+# Add a new partition and check that only stats for the new partition are computed.
+alter table complextypestbl_part add partition (p=2)
+  location '$FILESYSTEM_PREFIX/test-warehouse/complextypestbl_parquet/';
+compute incremental stats complextypestbl_part;
+---- RESULTS
+'Updated 1 partition(s) and 1 column(s).'
+---- TYPES
+STRING
+====

[7/9] incubator-impala git commit: IMPALA-4282: Remove max length check for type strings.

Posted by mj...@apache.org.

IMPALA-4282: Remove max length check for type strings.

During analysis, Impala used to enforce the default HMS
limit on the number of characters of a type string.
That enforcement has the benefit of a clear error
message, but has the severe drawback of not having
any workaround because the limit is baked into the code.
Going above the 4000 limit is pretty easy with complex
types, and several users have already run into this
issue.

This patch removes the Impala-side enforcement such
that the 4000 limit can be increased by altering
the corresponding columns in the Metastore's backend
database.

Change-Id: I01f5e503e14feee857bbdf80b3ef4f5a8b57fb2d
Reviewed-on: http://gerrit.cloudera.org:8080/6034
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/0c871529
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/0c871529
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/0c871529

Branch: refs/heads/master
Commit: 0c8715292e93769c71f8160ad85ecd57552ca585
Parents: b7a7636
Author: Alex Behm <al...@cloudera.com>
Authored: Wed Feb 15 23:15:18 2017 -0800
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Fri Feb 17 05:30:09 2017 +0000

----------------------------------------------------------------------
 .../org/apache/impala/analysis/ColumnDef.java   |  9 +----
 .../org/apache/impala/util/MetaStoreUtil.java   |  3 --
 .../apache/impala/analysis/AnalyzeDDLTest.java  | 39 --------------------
 3 files changed, 1 insertion(+), 50 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/0c871529/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java b/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
index 8993acb..d7217f7 100644
--- a/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
+++ b/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
@@ -176,18 +176,11 @@ public class ColumnDef {
     }
     Preconditions.checkNotNull(type_);
     Preconditions.checkState(type_.isValid());
-    // Check HMS constraints of type and comment.
-    String typeSql = type_.toSql();
-    if (typeSql.length() > MetaStoreUtil.MAX_TYPE_NAME_LENGTH) {
-      throw new AnalysisException(String.format(
-          "Type of column '%s' exceeds maximum type length of %d characters:\n" +
-          "%s has %d characters.", colName_, MetaStoreUtil.MAX_TYPE_NAME_LENGTH,
-          typeSql, typeSql.length()));
-    }
     if (hasKuduOptions()) {
       Preconditions.checkNotNull(analyzer);
       analyzeKuduOptions(analyzer);
     }
+    // Check HMS constraints on comment.
     if (comment_ != null &&
         comment_.length() > MetaStoreUtil.CREATE_MAX_COMMENT_LENGTH) {
       throw new AnalysisException(String.format(

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/0c871529/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java b/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
index 95ef700..8c9ac24 100644
--- a/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
+++ b/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
@@ -44,9 +44,6 @@ public class MetaStoreUtil {
   // for Postgres it is 4000, but for most other databases it is 256.
   public static final int CREATE_MAX_COMMENT_LENGTH = 256;
 
-  // Maximum length of the string representation of a type that the HMS can store.
-  public static final int MAX_TYPE_NAME_LENGTH = 4000;
-
   // The longest strings Hive accepts for [serde] property keys.
   public static final int MAX_PROPERTY_KEY_LENGTH = 256;
 

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/0c871529/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
----------------------------------------------------------------------
diff --git a/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java b/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
index 29c59e8..c0703eb 100644
--- a/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
+++ b/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
@@ -1799,12 +1799,6 @@ public class AnalyzeDDLTest extends FrontendTestBase {
         "Invalid column/field name: ???");
     AnalysisError("create table new_table (i int) PARTITIONED BY (`^&*` int)",
         "Invalid column/field name: ^&*");
-    // Test HMS constraint on type name length.
-    AnalyzesOk(String.format("create table t (i %s)",
-        genTypeSql(MetaStoreUtil.MAX_TYPE_NAME_LENGTH)));
-    AnalysisError(String.format("create table t (i %s)",
-        genTypeSql(MetaStoreUtil.MAX_TYPE_NAME_LENGTH + 1)),
-        "Type of column 'i' exceeds maximum type length of 4000 characters:");
     // Test HMS constraint on comment length.
     AnalyzesOk(String.format("create table t (i int comment '%s')",
         StringUtils.repeat("c", MetaStoreUtil.CREATE_MAX_COMMENT_LENGTH)));
@@ -1865,39 +1859,6 @@ public class AnalyzeDDLTest extends FrontendTestBase {
     }
   }
 
-  /**
-   * Generates a valid type string with exactly the given number of characters.
-   * The type is a struct with at least two fields.
-   * The given length must be at least "struct<s:int,c:int>".length() == 19.
-   */
-  private String genTypeSql(int length) {
-    Preconditions.checkState(length >= 19);
-    StringBuilder result = new StringBuilder();
-    result.append("struct<s:int");
-    // The middle fields always have a fixed length.
-    int midFieldLen = ",f000:int".length();
-    // The last field has a variable length, but this is the minimum.
-    int lastFieldMinLen = ",f:int".length();
-    int fieldIdx = 0;
-    while (result.length() < length - midFieldLen - lastFieldMinLen) {
-      String fieldStr = String.format(",f%03d:int", fieldIdx);
-      result.append(fieldStr);
-      ++fieldIdx;
-    }
-    Preconditions.checkState(result.length() == length - 1 ||
-        result.length() < length - lastFieldMinLen);
-    // Generate last field with a variable length.
-    if (result.length() < length - 1) {
-      int fieldNameLen = length - result.length() - ",:int".length() - 1;
-      Preconditions.checkState(fieldNameLen > 0);
-      String fieldStr = String.format(",%s:int", StringUtils.repeat("f", fieldNameLen));
-      result.append(fieldStr);
-    }
-    result.append(">");
-    Preconditions.checkState(result.length() == length);
-    return result.toString();
-  }
-
   @Test
   public void TestAlterKuduTable() {
     TestUtils.assumeKuduIsSupported();

[6/9] incubator-impala git commit: IMPALA-4933, IMPALA-4931: Simplify SSL initialization on startup

Posted by mj...@apache.org.

IMPALA-4933, IMPALA-4931: Simplify SSL initialization on startup

OpenSSL initialization functions are not threadsafe and are
currently called by both thrift, squeasel, and the Kudu
client. This change forces thrift to initialize OpenSSL on
startup by adding a TSSLSocketFactory to the AuthManager
which initializes OpenSSL upon creation and lives the
lifetime of the process. Then, squeasel is configured to
skip the OpenSSL initialization.

TODO: When the Kudu client supports a flag to disable its
initialization path (KUDU-1738), Impala will call that. In
the meantime, there will continue to be some small
likelihood of a race.

Also updates Squeasel in thirdparty to
c304d3f3481b07bf153979155f02e0aab24d01de
This is necessary to configure squeasel not to init OpenSSL.

Change-Id: I245a8a001103ddca7f07349faa82bbb5b9fe3ab0
Reviewed-on: http://gerrit.cloudera.org:8080/6027
Reviewed-by: Sailesh Mukil <sa...@cloudera.com>
Reviewed-by: Matthew Jacobs <mj...@cloudera.com>
Reviewed-by: Henry Robinson <he...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/b7a76361
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/b7a76361
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/b7a76361

Branch: refs/heads/master
Commit: b7a76361bbab7ea4e3c4df7574755cf3f6bc8cae
Parents: 661921b
Author: Matthew Jacobs <mj...@cloudera.com>
Authored: Tue Feb 14 15:41:30 2017 -0800
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Fri Feb 17 04:08:32 2017 +0000

----------------------------------------------------------------------
 be/src/rpc/authentication.cc          |  2 ++
 be/src/rpc/authentication.h           | 15 ++++++--
 be/src/thirdparty/squeasel/squeasel.c | 57 +++++++++++++++++++-----------
 be/src/util/CMakeLists.txt            |  5 ---
 be/src/util/webserver.cc              |  4 +++
 5 files changed, 54 insertions(+), 29 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/b7a76361/be/src/rpc/authentication.cc
----------------------------------------------------------------------
diff --git a/be/src/rpc/authentication.cc b/be/src/rpc/authentication.cc
index db0e608..665a68c 100644
--- a/be/src/rpc/authentication.cc
+++ b/be/src/rpc/authentication.cc
@@ -955,6 +955,8 @@ Status NoAuthProvider::WrapClientTransport(const string& hostname,
 }
 
 Status AuthManager::Init() {
+  ssl_socket_factory_.reset(new TSSLSocketFactory());
+
   bool use_ldap = false;
   const string excl_msg = "--$0 and --$1 are mutually exclusive "
       "and should not be set together";

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/b7a76361/be/src/rpc/authentication.h
----------------------------------------------------------------------
diff --git a/be/src/rpc/authentication.h b/be/src/rpc/authentication.h
index 858b9ea..a89d945 100644
--- a/be/src/rpc/authentication.h
+++ b/be/src/rpc/authentication.h
@@ -21,6 +21,7 @@
 
 #include <string>
 #include <thrift/transport/TTransport.h>
+#include <thrift/transport/TSSLSocket.h>
 
 #include "rpc/auth-provider.h"
 #include "sasl/sasl.h"
@@ -36,14 +37,14 @@ using namespace ::apache::thrift::transport;
 namespace impala {
 
 /// System-wide authentication manager responsible for initialising authentication systems,
-/// including Sasl and Kerberos, and for providing auth-enabled Thrift structures to
+/// including SSL, Sasl and Kerberos, and for providing auth-enabled Thrift structures to
 /// servers and clients.
 class AuthManager {
  public:
   static AuthManager* GetInstance() { return AuthManager::auth_manager_; }
 
-  /// Set up internal and external AuthProvider classes.  This does a bunch of flag
-  /// checking and calls each AuthProvider->Start().
+  /// Set up internal and external AuthProvider classes. This also initializes SSL (via
+  /// the creation of ssl_socket_factory_).
   Status Init();
 
   /// Returns the authentication provider to use for "external" communication
@@ -64,6 +65,14 @@ class AuthManager {
   /// don't have to check the auth flags to figure out which auth provider to use.
   boost::scoped_ptr<AuthProvider> internal_auth_provider_;
   boost::scoped_ptr<AuthProvider> external_auth_provider_;
+
+  /// A thrift SSL socket factory must be created and live the lifetime of the process to
+  /// ensure that the thrift OpenSSL initialization code runs at Init(), and is not
+  /// unregistered (which thrift will do when the refcount of TSSLSocketFactory objects
+  /// reach 0), see IMPALA-4933. For simplicity, and because Kudu will expect SSL to be
+  /// initialized, this will be created regardless of whether or not SSL credentials are
+  /// specified. This factory isn't otherwise used.
+  boost::scoped_ptr<TSSLSocketFactory> ssl_socket_factory_;
 };
 
 

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/b7a76361/be/src/thirdparty/squeasel/squeasel.c
----------------------------------------------------------------------
diff --git a/be/src/thirdparty/squeasel/squeasel.c b/be/src/thirdparty/squeasel/squeasel.c
index 7cdcad2..3d27d2d 100644
--- a/be/src/thirdparty/squeasel/squeasel.c
+++ b/be/src/thirdparty/squeasel/squeasel.c
@@ -211,8 +211,8 @@ enum {
   ACCESS_LOG_FILE, ENABLE_DIRECTORY_LISTING, ERROR_LOG_FILE,
   GLOBAL_PASSWORDS_FILE, INDEX_FILES, ENABLE_KEEP_ALIVE, ACCESS_CONTROL_LIST,
   EXTRA_MIME_TYPES, LISTENING_PORTS, DOCUMENT_ROOT, SSL_CERTIFICATE, SSL_PRIVATE_KEY,
-  SSL_PRIVATE_KEY_PASSWORD, NUM_THREADS, RUN_AS_USER, REWRITE, HIDE_FILES,
-  REQUEST_TIMEOUT, NUM_OPTIONS
+  SSL_PRIVATE_KEY_PASSWORD, SSL_GLOBAL_INIT, NUM_THREADS, RUN_AS_USER, REWRITE,
+  HIDE_FILES, REQUEST_TIMEOUT, NUM_OPTIONS
 };
 
 static const char *config_options[] = {
@@ -238,6 +238,7 @@ static const char *config_options[] = {
   "ssl_certificate", NULL,
   "ssl_private_key", NULL,
   "ssl_private_key_password", NULL,
+  "ssl_global_init", "yes",
   "num_threads", "50",
   "run_as_user", NULL,
   "url_rewrite_patterns", NULL,
@@ -4193,11 +4194,38 @@ static int set_ssl_option(struct sq_context *ctx) {
   const char *private_key = ctx->config[SSL_PRIVATE_KEY];
   if (private_key == NULL) private_key = pem;
 
-  // Initialize SSL library
-  SSL_library_init();
-  SSL_load_error_strings();
+  // Initialize SSL library, unless the user has disabled this.
+  int should_init_ssl = (sq_strcasecmp(ctx->config[SSL_GLOBAL_INIT], "yes") == 0);
+  if (should_init_ssl) {
+    SSL_library_init();
+    SSL_load_error_strings();
+    // Initialize locking callbacks, needed for thread safety.
+    // http://www.openssl.org/support/faq.html#PROG1
+    size = sizeof(pthread_mutex_t) * CRYPTO_num_locks();
+    if ((ssl_mutexes = (pthread_mutex_t *) malloc((size_t)size)) == NULL) {
+      cry(fc(ctx), "%s: cannot allocate mutexes: %s", __func__, ssl_error());
+      return 0;
+    }
+
+    for (i = 0; i < CRYPTO_num_locks(); i++) {
+      pthread_mutex_init(&ssl_mutexes[i], NULL);
+    }
+
+    CRYPTO_set_locking_callback(&ssl_locking_callback);
+    CRYPTO_set_id_callback(&ssl_id_callback);
+  }
 
   if ((ctx->ssl_ctx = SSL_CTX_new(SSLv23_server_method())) == NULL) {
+    unsigned long err_code = ERR_peek_error();
+    // If it looks like the error is due to SSL not being initialized,
+    // provide a better error.
+    if (!should_init_ssl &&
+        ERR_GET_LIB(err_code) == ERR_LIB_SSL &&
+        ERR_GET_REASON(err_code) == SSL_R_LIBRARY_HAS_NO_CIPHERS) {
+      cry(fc(ctx), "SSL_CTX_new failed: %s was disabled: OpenSSL must "
+                   "be initialized before starting squeasel",
+                   config_options[SSL_GLOBAL_INIT * 2]);
+    }
     cry(fc(ctx), "SSL_CTX_new (server) error: %s", ssl_error());
     return 0;
   }
@@ -4223,32 +4251,18 @@ static int set_ssl_option(struct sq_context *ctx) {
     (void) SSL_CTX_use_certificate_chain_file(ctx->ssl_ctx, pem);
   }
 
-  // Initialize locking callbacks, needed for thread safety.
-  // http://www.openssl.org/support/faq.html#PROG1
-  size = sizeof(pthread_mutex_t) * CRYPTO_num_locks();
-  if ((ssl_mutexes = (pthread_mutex_t *) malloc((size_t)size)) == NULL) {
-    cry(fc(ctx), "%s: cannot allocate mutexes: %s", __func__, ssl_error());
-    return 0;
-  }
-
-  for (i = 0; i < CRYPTO_num_locks(); i++) {
-    pthread_mutex_init(&ssl_mutexes[i], NULL);
-  }
-
-  CRYPTO_set_locking_callback(&ssl_locking_callback);
-  CRYPTO_set_id_callback(&ssl_id_callback);
 
   return 1;
 }
 
 static void uninitialize_ssl(struct sq_context *ctx) {
   int i;
-  if (ctx->ssl_ctx != NULL) {
+  if (ctx->ssl_ctx != NULL &&
+      sq_strcasecmp(ctx->config[SSL_GLOBAL_INIT], "yes") == 0) {
     CRYPTO_set_locking_callback(NULL);
     for (i = 0; i < CRYPTO_num_locks(); i++) {
       pthread_mutex_destroy(&ssl_mutexes[i]);
     }
-    CRYPTO_set_locking_callback(NULL);
     CRYPTO_set_id_callback(NULL);
   }
 }
@@ -4501,6 +4515,7 @@ static int consume_socket(struct sq_context *ctx, struct socket *sp) {
     clock_get_time(cclock, &mts);
     mach_port_deallocate(mach_task_self(), cclock);
     timeout.tv_sec = mts.tv_sec;
+    timeout.tv_nsec = (long) mts.tv_nsec;
 #endif
 
     ctx->num_free_threads++;

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/b7a76361/be/src/util/CMakeLists.txt
----------------------------------------------------------------------
diff --git a/be/src/util/CMakeLists.txt b/be/src/util/CMakeLists.txt
index 6656a38..8039b31 100644
--- a/be/src/util/CMakeLists.txt
+++ b/be/src/util/CMakeLists.txt
@@ -18,11 +18,6 @@
 set(SQUEASEL_SRC_DIR "${CMAKE_SOURCE_DIR}/be/src/thirdparty/squeasel")
 set(MUSTACHE_SRC_DIR "${CMAKE_SOURCE_DIR}/be/src/thirdparty/mustache")
 
-# Without this option Squeasel looks up the SSL library at run-time
-# and may not guess the correct name on some distributions
-SET_SOURCE_FILES_PROPERTIES(${SQUEASEL_SRC_DIR}/squeasel.c PROPERTIES
-  COMPILE_FLAGS -DNO_SSL_DL)
-
 # where to put generated libraries
 set(LIBRARY_OUTPUT_PATH "${BUILD_OUTPUT_ROOT_DIRECTORY}/util")
 

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/b7a76361/be/src/util/webserver.cc
----------------------------------------------------------------------
diff --git a/be/src/util/webserver.cc b/be/src/util/webserver.cc
index d971852..61ef3ea 100644
--- a/be/src/util/webserver.cc
+++ b/be/src/util/webserver.cc
@@ -253,6 +253,10 @@ Status Webserver::Start() {
 
   string key_password;
   if (IsSecure()) {
+    // Impala initializes OpenSSL (see authentication.h).
+    options.push_back("ssl_global_init");
+    options.push_back("false");
+
     options.push_back("ssl_certificate");
     options.push_back(FLAGS_webserver_certificate_file.c_str());

[2/9] incubator-impala git commit: [DOCS] Major update to Impala + Kudu page

Posted by mj...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_tables.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_tables.xml b/docs/topics/impala_tables.xml
index 5041111..ae65045 100644
--- a/docs/topics/impala_tables.xml
+++ b/docs/topics/impala_tables.xml
@@ -73,14 +73,16 @@ under the License.
     </ul>
 
     <p rev="2.2.0">
-      Impala tables can also represent data that is stored in HBase, or in the Amazon S3 filesystem (CDH 5.4.0 or higher),
-      or on Isilon storage devices (CDH 5.4.3 or higher).  See <xref href="impala_hbase.xml#impala_hbase"/>,
+      Impala tables can also represent data that is stored in HBase, or in the Amazon S3 filesystem (<keyword keyref="impala22_full"/> or higher),
+      or on Isilon storage devices (<keyword keyref="impala223_full"/> or higher).  See <xref href="impala_hbase.xml#impala_hbase"/>,
       <xref href="impala_s3.xml#s3"/>, and <xref href="impala_isilon.xml#impala_isilon"/>
       for details about those special kinds of tables.
     </p>
 
     <p conref="../shared/impala_common.xml#common/ignore_file_extensions"/>
 
+    <p outputclass="toc inpage"/>
+
     <p>
       <b>Related statements:</b> <xref href="impala_create_table.xml#create_table"/>,
       <xref href="impala_drop_table.xml#drop_table"/>, <xref href="impala_alter_table.xml#alter_table"/>
@@ -241,6 +243,7 @@ under the License.
 
   <concept id="table_file_formats">
     <title>File Formats</title>
+
     <conbody>
       <p>
         Each table has an associated file format, which determines how Impala interprets the
@@ -273,4 +276,142 @@ under the License.
     </conbody>
   </concept>
 
+  <concept rev="kudu" id="kudu_tables">
+    <title>Kudu Tables</title>
+    <prolog>
+      <metadata>
+        <data name="Category" value="Kudu"/>
+      </metadata>
+    </prolog>
+
+    <conbody>
+      <p>
+        Tables stored in Apache Kudu are treated specially, because Kudu manages its data independently of HDFS files.
+        Some information about the table is stored in the metastore database for use by Impala. Other table metadata is
+        managed internally by Kudu.
+      </p>
+
+      <p>
+        When you create a Kudu table through Impala, it is assigned an internal Kudu table name of the form
+        <codeph>impala::<varname>db_name</varname>.<varname>table_name</varname></codeph>. You can see the Kudu-assigned name
+        in the output of <codeph>DESCRIBE FORMATTED</codeph>, in the <codeph>kudu.table_name</codeph> field of the table properties.
+        The Kudu-assigned name remains the same even if you use <codeph>ALTER TABLE</codeph> to rename the Impala table
+        or move it to a different Impala database. If you issue the statement
+        <codeph>ALTER TABLE <varname>impala_name</varname> SET TBLPROPERTIES('kudu.table_name' = '<varname>different_kudu_table_name</varname>')</codeph>,
+        the effect is different depending on whether the Impala table was created with a regular <codeph>CREATE TABLE</codeph>
+        statement (that is, if it is an internal or managed table), or if it was created with a
+        <codeph>CREATE EXTERNAL TABLE</codeph> statement (and therefore is an external table). Changing the <codeph>kudu.table_name</codeph>
+        property of an internal table physically renames the underlying Kudu table to match the new name.
+        Changing the <codeph>kudu.table_name</codeph> property of an external table switches which underlying Kudu table
+        the Impala table refers to; the underlying Kudu table must already exist.
+      </p>
+
+      <p>
+        The following example shows what happens with both internal and external Kudu tables as the <codeph>kudu.table_name</codeph>
+        property is changed. In practice, external tables are typically used to access underlying Kudu tables that were created
+        outside of Impala, that is, through the Kudu API.
+      </p>
+
+<codeblock>
+-- This is an internal table that we will create and then rename.
+create table old_name (id bigint primary key, s string)
+  partition by hash(id) partitions 2 stored as kudu;
+
+-- Initially, the name OLD_NAME is the same on the Impala and Kudu sides.
+describe formatted old_name;
+...
+| Location:          | hdfs://host.example.com:8020/path/user.db/old_name
+| Table Type:        | MANAGED_TABLE         | NULL
+| Table Parameters:  | NULL                  | NULL
+|                    | DO_NOT_UPDATE_STATS   | true
+|                    | kudu.master_addresses | vd0342.halxg.cloudera.com
+|                    | kudu.table_name       | impala::user.old_name
+
+-- ALTER TABLE RENAME TO changes the Impala name but not the underlying Kudu name.
+alter table old_name rename to new_name;
+
+describe formatted new_name;
+| Location:          | hdfs://host.example.com:8020/path/user.db/new_name
+| Table Type:        | MANAGED_TABLE         | NULL
+| Table Parameters:  | NULL                  | NULL
+|                    | DO_NOT_UPDATE_STATS   | true
+|                    | kudu.master_addresses | vd0342.halxg.cloudera.com
+|                    | kudu.table_name       | impala::user.old_name
+
+-- Setting TBLPROPERTIES changes the underlying Kudu name.
+alter table new_name
+  set tblproperties('kudu.table_name' = 'impala::user.new_name');
+
+describe formatted new_name;
+| Location:          | hdfs://host.example.com:8020/path/user.db/new_name
+| Table Type:        | MANAGED_TABLE         | NULL
+| Table Parameters:  | NULL                  | NULL
+|                    | DO_NOT_UPDATE_STATS   | true
+|                    | kudu.master_addresses | vd0342.halxg.cloudera.com
+|                    | kudu.table_name       | impala::user.new_name
+
+-- Put some data in the table to demonstrate how external tables can map to
+-- different underlying Kudu tables.
+insert into new_name values (0, 'zero'), (1, 'one'), (2, 'two');
+
+-- This external table points to the same underlying Kudu table, NEW_NAME,
+-- as we created above. No need to declare columns or other table aspects.
+create external table kudu_table_alias stored as kudu
+  tblproperties('kudu.table_name' = 'impala::user.new_name');
+
+-- The external table can fetch data from the NEW_NAME table that already
+-- existed and already had data.
+select * from kudu_table_alias limit 100;
++----+------+
+| id | s    |
++----+------+
+| 1  | one  |
+| 0  | zero |
+| 2  | two  |
++----+------+
+
+-- We cannot re-point the external table at a different underlying Kudu table
+-- unless that other underlying Kudu table already exists.
+alter table kudu_table_alias
+  set tblproperties('kudu.table_name' = 'impala::user.yet_another_name');
+ERROR:
+TableLoadingException: Error opening Kudu table 'impala::user.yet_another_name',
+  Kudu error: The table does not exist: table_name: "impala::user.yet_another_name"
+
+-- Once the underlying Kudu table exists, we can re-point the external table to it.
+create table yet_another_name (id bigint primary key, x int, y int, s string)
+  partition by hash(id) partitions 2 stored as kudu;
+
+alter table kudu_table_alias
+  set tblproperties('kudu.table_name' = 'impala::user.yet_another_name');
+
+-- Now no data is returned because this other table is empty.
+select * from kudu_table_alias limit 100;
+
+-- The Impala table automatically recognizes the table schema of the new table,
+-- for example the extra X and Y columns not present in the original table.
+describe kudu_table_alias;
++------+--------+---------+-------------+----------+...
+| name | type   | comment | primary_key | nullable |...
++------+--------+---------+-------------+----------+...
+| id   | bigint |         | true        | false    |...
+| x    | int    |         | false       | true     |...
+| y    | int    |         | false       | true     |...
+| s    | string |         | false       | true     |...
++------+--------+---------+-------------+----------+...
+</codeblock>
+
+      <p>
+        The <codeph>SHOW TABLE STATS</codeph> output for a Kudu table shows Kudu-specific details about the layout of the table.
+        Instead of information about the number and sizes of files, the information is divided by the Kudu tablets.
+        For each tablet, the output includes the fields
+        <codeph># Rows</codeph> (although this number is not currently computed), <codeph>Start Key</codeph>, <codeph>Stop Key</codeph>, <codeph>Leader Replica</codeph>, and <codeph># Replicas</codeph>.
+        The output of <codeph>SHOW COLUMN STATS</codeph>, illustrating the distribution of values within each column, is the same for Kudu tables
+        as for HDFS-backed tables.
+      </p>
+
+      <p conref="../shared/impala_common.xml#common/kudu_internal_external_tables"/>
+    </conbody>
+  </concept>
+
 </concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_timestamp.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_timestamp.xml b/docs/topics/impala_timestamp.xml
index 9dc9753..2e807b0 100644
--- a/docs/topics/impala_timestamp.xml
+++ b/docs/topics/impala_timestamp.xml
@@ -436,6 +436,9 @@ insert into dates_and_times values
 
     <p conref="../shared/impala_common.xml#common/avro_no_timestamp"/>
 
+    <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+    <p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
+
     <p conref="../shared/impala_common.xml#common/related_info"/>
 
     <ul>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_truncate_table.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_truncate_table.xml b/docs/topics/impala_truncate_table.xml
index 3cf5ec3..e2de146 100644
--- a/docs/topics/impala_truncate_table.xml
+++ b/docs/topics/impala_truncate_table.xml
@@ -102,6 +102,9 @@ under the License.
       permission for all the files and directories that make up the table.
     </p>
 
+    <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+    <p conref="../shared/impala_common.xml#common/kudu_no_truncate_table"/>
+
     <p conref="../shared/impala_common.xml#common/example_blurb"/>
 
     <p>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_varchar.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_varchar.xml b/docs/topics/impala_varchar.xml
index 4226c3a..5917724 100644
--- a/docs/topics/impala_varchar.xml
+++ b/docs/topics/impala_varchar.xml
@@ -128,6 +128,9 @@ prefer to use an integer data type with sufficient range (<codeph>INT</codeph>,
 
     <p conref="../shared/impala_common.xml#common/column_stats_variable"/>
 
+    <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+    <p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
+
     <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
 
     <p conref="../shared/impala_common.xml#common/blobs_are_strings"/>

[4/9] incubator-impala git commit: [DOCS] Major update to Impala + Kudu page

Posted by mj...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_create_table.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_create_table.xml b/docs/topics/impala_create_table.xml
index 818553d..77c6452 100644
--- a/docs/topics/impala_create_table.xml
+++ b/docs/topics/impala_create_table.xml
@@ -21,7 +21,13 @@ under the License.
 <concept id="create_table" outputclass="impala sql_statement">
 
   <title outputclass="impala_title sql_statement_title">CREATE TABLE Statement</title>
-  <titlealts audience="PDF"><navtitle>CREATE TABLE</navtitle></titlealts>
+
+  <titlealts audience="PDF">
+
+    <navtitle>CREATE TABLE</navtitle>
+
+  </titlealts>
+
   <prolog>
     <metadata>
       <data name="Category" value="Impala"/>
@@ -34,7 +40,7 @@ under the License.
       <data name="Category" value="Tables"/>
       <data name="Category" value="Schemas"/>
       <data name="Category" value="S3"/>
-      <!-- <data name="Category" value="Kudu"/> -->
+      <data name="Category" value="Kudu"/>
     </metadata>
   </prolog>
 
@@ -42,8 +48,8 @@ under the License.
 
     <p>
       <indexterm audience="hidden">CREATE TABLE statement</indexterm>
-      Creates a new table and specifies its characteristics. While creating a table, you optionally specify aspects
-      such as:
+      Creates a new table and specifies its characteristics. While creating a table, you
+      optionally specify aspects such as:
     </p>
 
     <ul>
@@ -79,7 +85,10 @@ under the License.
     </p>
 
 <codeblock>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [<varname>db_name</varname>.]<varname>table_name</varname>
-  (<varname>col_name</varname> <varname>data_type</varname> [COMMENT '<varname>col_comment</varname>'], ...)
+  (<varname>col_name</varname> <varname>data_type</varname>
+    [COMMENT '<varname>col_comment</varname>']
+    [, ...]
+  )
   [PARTITIONED BY (<varname>col_name</varname> <varname>data_type</varname> [COMMENT '<varname>col_comment</varname>'], ...)]
   [COMMENT '<varname>table_comment</varname>']
   [WITH SERDEPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
@@ -92,28 +101,6 @@ under the License.
 </codeblock>
 
     <p>
-      <b>Column definitions inferred from data file:</b>
-    </p>
-
-<codeblock>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [<varname>db_name</varname>.]<varname>table_name</varname>
-  LIKE PARQUET '<varname>hdfs_path_of_parquet_file</varname>'
-  [COMMENT '<varname>table_comment</varname>']
-  [PARTITIONED BY (<varname>col_name</varname> <varname>data_type</varname> [COMMENT '<varname>col_comment</varname>'], ...)]
-  [WITH SERDEPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
-  [
-   [ROW FORMAT <varname>row_format</varname>] [STORED AS <varname>file_format</varname>]
-  ]
-  [LOCATION '<varname>hdfs_path</varname>']
-  [TBLPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
-<ph rev="1.4.0">  [CACHED IN '<varname>pool_name</varname>'</ph> <ph rev="2.2.0">[WITH REPLICATION = <varname>integer</varname>]</ph> | UNCACHED]
-data_type:
-    <varname>primitive_type</varname>
-  | array_type
-  | map_type
-  | struct_type
-</codeblock>
-
-    <p>
       <b>CREATE TABLE AS SELECT:</b>
     </p>
 
@@ -171,9 +158,106 @@ file_format:
   | TEXTFILE</ph>
 </codeblock>
 
+    <p>
+      <b>Column definitions inferred from data file:</b>
+    </p>
+
+<codeblock>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [<varname>db_name</varname>.]<varname>table_name</varname>
+  LIKE PARQUET '<varname>hdfs_path_of_parquet_file</varname>'
+  [COMMENT '<varname>table_comment</varname>']
+  [PARTITIONED BY (<varname>col_name</varname> <varname>data_type</varname> [COMMENT '<varname>col_comment</varname>'], ...)]
+  [WITH SERDEPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
+  [
+   [ROW FORMAT <varname>row_format</varname>] [STORED AS <varname>file_format</varname>]
+  ]
+  [LOCATION '<varname>hdfs_path</varname>']
+  [TBLPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
+<ph rev="1.4.0">  [CACHED IN '<varname>pool_name</varname>'</ph> <ph rev="2.2.0">[WITH REPLICATION = <varname>integer</varname>]</ph> | UNCACHED]
+data_type:
+    <varname>primitive_type</varname>
+  | array_type
+  | map_type
+  | struct_type
+</codeblock>
+
+    <p>
+      <b>Kudu tables:</b>
+    </p>
+
+<codeblock rev="kudu">CREATE TABLE [IF NOT EXISTS] [<varname>db_name</varname>.]<varname>table_name</varname>
+  (<varname>col_name</varname> <varname>data_type</varname>
+    <ph rev="kudu IMPALA-3719">[<varname>kudu_column_attribute</varname> ...]</ph>
+    [COMMENT '<varname>col_comment</varname>']
+    [, ...]
+    [PRIMARY KEY (<varname>col_name</varname>[, ...])]
+  )
+  <ph rev="kudu">[PARTITION BY <varname>kudu_partition_clause</varname></ph>
+  [COMMENT '<varname>table_comment</varname>']
+  STORED AS KUDU
+  [TBLPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
+</codeblock>
+
+    <p rev="kudu IMPALA-3719">
+      <b>Kudu column attributes:</b>
+<codeblock rev="kudu">
+  PRIMARY KEY
+| [NOT] NULL
+| ENCODING <varname>codec</varname>
+| COMPRESSION <varname>algorithm</varname>
+| DEFAULT <varname>constant</varname>
+| BLOCK_SIZE <varname>number</varname>
+</codeblock>
+    </p>
+
+    <p rev="kudu IMPALA-3719">
+      <b>kudu_partition_clause:</b>
+<codeblock rev="kudu">
+kudu_partition_clause ::= PARTITION BY [<varname>hash_clause</varname>] [, <varname>range_clause</varname> [ , <varname>range_clause</varname> ] ]
+
+hash_clause ::=
+  HASH [ (<varname>pk_col</varname> [, ...]) ]
+    PARTITIONS <varname>n</varname>
+
+range_clause ::=
+  RANGE [ (<varname>pk_col</varname> [, ...]) ]
+  (
+    {
+      PARTITION <varname>constant_expression</varname> <varname>range_comparison_operator</varname> VALUES <varname>range_comparison_operator</varname> <varname>constant_expression</varname>
+      | PARTITION VALUE = <varname>constant_expression_or_tuple</varname>
+    }
+   [, ...]
+  )
+
+range_comparison_operator ::= { &lt; | &lt;= }
+</codeblock>
+    </p>
+
+    <p>
+      <b>External Kudu tables:</b>
+    </p>
+
+<codeblock rev="kudu">CREATE EXTERNAL TABLE [IF NOT EXISTS] [<varname>db_name</varname>.]<varname>table_name</varname>
+  [COMMENT '<varname>table_comment</varname>']
+  STORED AS KUDU
+  [TBLPROPERTIES ('kudu.table_name'='<varname>internal_kudu_name</varname>')]
+</codeblock>
+
+    <p>
+      <b>CREATE TABLE AS SELECT for Kudu tables:</b>
+    </p>
+
+<codeblock rev="kudu">CREATE TABLE [IF NOT EXISTS] <varname>db_name</varname>.]<varname>table_name</varname>
+  [PRIMARY KEY (<varname>col_name</varname>[, ...])]
+  [PARTITION BY <varname>kudu_partition_clause</varname>
+  [COMMENT '<varname>table_comment</varname>']
+  STORED AS KUDU
+  [TBLPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
+AS
+  <varname>select_statement</varname></codeblock>
+
     <p conref="../shared/impala_common.xml#common/ddl_blurb"/>
 
-    <!-- Should really have some info up front about all the data types and file formats.
+<!-- Should really have some info up front about all the data types and file formats.
          Consider adding here, or at least making inline links to the relevant keywords
          in the syntax spec above. -->
 
@@ -182,45 +266,47 @@ file_format:
     </p>
 
     <p>
-      Depending on the form of the <codeph>CREATE TABLE</codeph> statement, the column definitions are
-      required or not allowed.
+      Depending on the form of the <codeph>CREATE TABLE</codeph> statement, the column
+      definitions are required or not allowed.
     </p>
 
     <p>
       With the <codeph>CREATE TABLE AS SELECT</codeph> and <codeph>CREATE TABLE LIKE</codeph>
-      syntax, you do not specify the columns at all; the column names and types are derived from the source table, query,
-      or data file.
+      syntax, you do not specify the columns at all; the column names and types are derived from
+      the source table, query, or data file.
     </p>
 
     <p>
       With the basic <codeph>CREATE TABLE</codeph> syntax, you must list one or more columns,
-      its name, type, and optionally a comment, in addition to any columns used as partitioning keys.
-      There is one exception where the column list is not required: when creating an Avro table with the
-      <codeph>STORED AS AVRO</codeph> clause, you can omit the list of columns and specify the same metadata
-      as part of the <codeph>TBLPROPERTIES</codeph> clause.
+      its name, type, and optionally a comment, in addition to any columns used as partitioning
+      keys. There is one exception where the column list is not required: when creating an Avro
+      table with the <codeph>STORED AS AVRO</codeph> clause, you can omit the list of columns
+      and specify the same metadata as part of the <codeph>TBLPROPERTIES</codeph> clause.
     </p>
 
     <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
 
     <p rev="2.3.0">
-      The Impala complex types (<codeph>STRUCT</codeph>, <codeph>ARRAY</codeph>, or <codeph>MAP</codeph>)
-      are available in <keyword keyref="impala23_full"/> and higher.
-      Because you can nest these types (for example, to make an array of maps or a struct
-      with an array field), these types are also sometimes referred to as nested types.
-      See <xref href="impala_complex_types.xml#complex_types"/> for usage details.
+      The Impala complex types (<codeph>STRUCT</codeph>, <codeph>ARRAY</codeph>, or
+      <codeph>MAP</codeph>) are available in <keyword keyref="impala23_full"/> and higher.
+      Because you can nest these types (for example, to make an array of maps or a struct with
+      an array field), these types are also sometimes referred to as nested types. See
+      <xref href="impala_complex_types.xml#complex_types"/> for usage details.
     </p>
 
-    <!-- This is kind of an obscure and rare usage scenario. Consider moving all the complex type stuff further down
+<!-- This is kind of an obscure and rare usage scenario. Consider moving all the complex type stuff further down
          after some of the more common clauses. -->
+
     <p rev="2.3.0">
       Impala can create tables containing complex type columns, with any supported file format.
       Because currently Impala can only query complex type columns in Parquet tables, creating
       tables with complex type columns and other file formats such as text is of limited use.
-      For example, you might create a text table including some columns with complex types with Impala, and use Hive
-      as part of your to ingest the nested type data and copy it to an identical Parquet table.
-      Or you might create a partitioned table containing complex type columns using one file format, and
-      use <codeph>ALTER TABLE</codeph> to change the file format of individual partitions to Parquet; Impala
-      can then query only the Parquet-format partitions in that table.
+      For example, you might create a text table including some columns with complex types with
+      Impala, and use Hive as part of your to ingest the nested type data and copy it to an
+      identical Parquet table. Or you might create a partitioned table containing complex type
+      columns using one file format, and use <codeph>ALTER TABLE</codeph> to change the file
+      format of individual partitions to Parquet; Impala can then query only the Parquet-format
+      partitions in that table.
     </p>
 
     <p conref="../shared/impala_common.xml#common/complex_types_partitioning"/>
@@ -230,19 +316,20 @@ file_format:
     </p>
 
     <p>
-      By default, Impala creates an <q>internal</q> table, where Impala manages the underlying data files for the
-      table, and physically deletes the data files when you drop the table. If you specify the
-      <codeph>EXTERNAL</codeph> clause, Impala treats the table as an <q>external</q> table, where the data files
-      are typically produced outside Impala and queried from their original locations in HDFS, and Impala leaves
-      the data files in place when you drop the table. For details about internal and external tables, see
+      By default, Impala creates an <q>internal</q> table, where Impala manages the underlying
+      data files for the table, and physically deletes the data files when you drop the table.
+      If you specify the <codeph>EXTERNAL</codeph> clause, Impala treats the table as an
+      <q>external</q> table, where the data files are typically produced outside Impala and
+      queried from their original locations in HDFS, and Impala leaves the data files in place
+      when you drop the table. For details about internal and external tables, see
       <xref href="impala_tables.xml#tables"/>.
     </p>
 
     <p>
-      Typically, for an external table you include a <codeph>LOCATION</codeph> clause to specify the path to the
-      HDFS directory where Impala reads and writes files for the table. For example, if your data pipeline produces
-      Parquet files in the HDFS directory <filepath>/user/etl/destination</filepath>, you might create an external
-      table as follows:
+      Typically, for an external table you include a <codeph>LOCATION</codeph> clause to specify
+      the path to the HDFS directory where Impala reads and writes files for the table. For
+      example, if your data pipeline produces Parquet files in the HDFS directory
+      <filepath>/user/etl/destination</filepath>, you might create an external table as follows:
     </p>
 
 <codeblock>CREATE EXTERNAL TABLE external_parquet (c1 INT, c2 STRING, c3 TIMESTAMP)
@@ -250,11 +337,12 @@ file_format:
 </codeblock>
 
     <p>
-      Although the <codeph>EXTERNAL</codeph> and <codeph>LOCATION</codeph> clauses are often specified together,
-      <codeph>LOCATION</codeph> is optional for external tables, and you can also specify <codeph>LOCATION</codeph>
-      for internal tables. The difference is all about whether Impala <q>takes control</q> of the underlying data
-      files and moves them when you rename the table, or deletes them when you drop the table. For more about
-      internal and external tables and how they interact with the <codeph>LOCATION</codeph> attribute, see
+      Although the <codeph>EXTERNAL</codeph> and <codeph>LOCATION</codeph> clauses are often
+      specified together, <codeph>LOCATION</codeph> is optional for external tables, and you can
+      also specify <codeph>LOCATION</codeph> for internal tables. The difference is all about
+      whether Impala <q>takes control</q> of the underlying data files and moves them when you
+      rename the table, or deletes them when you drop the table. For more about internal and
+      external tables and how they interact with the <codeph>LOCATION</codeph> attribute, see
       <xref href="impala_tables.xml#tables"/>.
     </p>
 
@@ -263,80 +351,330 @@ file_format:
     </p>
 
     <p>
-      The <codeph>PARTITIONED BY</codeph> clause divides the data files based on the values from one or more
-      specified columns. Impala queries can use the partition metadata to minimize the amount of data that is read
-      from disk or transmitted across the network, particularly during join queries. For details about
-      partitioning, see <xref href="impala_partitioning.xml#partitioning"/>.
+      The <codeph>PARTITIONED BY</codeph> clause divides the data files based on the values from
+      one or more specified columns. Impala queries can use the partition metadata to minimize
+      the amount of data that is read from disk or transmitted across the network, particularly
+      during join queries. For details about partitioning, see
+      <xref href="impala_partitioning.xml#partitioning"/>.
     </p>
 
+    <note rev="IMPALA-3719">
+      <p>
+        All Kudu tables require partitioning, which involves different syntax than non-Kudu
+        tables. See the <codeph>PARTITION BY</codeph> clause, rather than <codeph>PARTITIONED
+        BY</codeph>, for Kudu tables.
+      </p>
+    </note>
+
     <p rev="2.5.0">
-      Prior to <keyword keyref="impala25_full"/> you could use a partitioned table
-      as the source and copy data from it, but could not specify any partitioning clauses for the new table.
-      In <keyword keyref="impala25_full"/> and higher, you can now use the <codeph>PARTITIONED BY</codeph> clause with a
-      <codeph>CREATE TABLE AS SELECT</codeph> statement. See the examples under the following discussion of
-      the <codeph>CREATE TABLE AS SELECT</codeph> syntax variation.
+      Prior to <keyword keyref="impala25_full"/>, you could use a partitioned table as the
+      source and copy data from it, but could not specify any partitioning clauses for the new
+      table. In <keyword keyref="impala25_full"/> and higher, you can now use the
+      <codeph>PARTITIONED BY</codeph> clause with a <codeph>CREATE TABLE AS SELECT</codeph>
+      statement. See the examples under the following discussion of the <codeph>CREATE TABLE AS
+      SELECT</codeph> syntax variation.
     </p>
 
-<!--
+    <p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
+
     <p rev="kudu">
-      <b>Partitioning for Kudu tables (PARTITION BY clause)</b>
+      Because Kudu tables do not support clauses related to HDFS and S3 data files and
+      partitioning mechanisms, the syntax associated with the <codeph>STORED AS KUDU</codeph>
+      clause is shown separately in the above syntax descriptions. Kudu tables have their own
+      syntax for <codeph>CREATE TABLE</codeph>, <codeph>CREATE EXTERNAL TABLE</codeph>, and
+      <codeph>CREATE TABLE AS SELECT</codeph>. All internal Kudu tables require a
+      <codeph>PARTITION BY</codeph> clause, different than the <codeph>PARTITIONED BY</codeph>
+      clause for HDFS-backed tables.
+    </p>
+
+    <p>
+      Here are some examples of creating empty Kudu tables:
+    </p>
+
+<codeblock><![CDATA[
+-- Single-column primary key.
+CREATE TABLE kudu_t1 (id BIGINT PRIMARY key, s STRING, b BOOLEAN)
+  PARTITION BY HASH (id) PARTITIONS 20 STORED AS KUDU;
+
+-- Multi-column primary key.
+CREATE TABLE kudu_t2 (id BIGINT, s STRING, b BOOLEAN, PRIMARY KEY (id,s))
+  PARTITION BY HASH (s) PARTITIONS 30 STORED AS KUDU;
+
+-- Meaningful primary key column is good for range partitioning.
+CREATE TABLE kudu_t3 (id BIGINT, year INT, s STRING,
+    b BOOLEAN, PRIMARY KEY (id,year))
+  PARTITION BY HASH (id) PARTITIONS 20,
+  RANGE (year) (PARTITION 1980 <= VALUES < 1990,
+    PARTITION 1990 <= VALUES < 2000,
+    PARTITION VALUE = 2001,
+    PARTITION 2001 < VALUES)
+  STORED AS KUDU;
+]]>
+</codeblock>
+
+    <p>
+      Here is an example of creating an external Kudu table:
+    </p>
+
+<codeblock><![CDATA[
+-- Inherits column definitions from original table.
+-- For tables created through Impala, the kudu.table_name property
+-- comes from DESCRIBE FORMATTED output from the original table.
+CREATE EXTERNAL TABLE external_t1 STORED AS KUDU
+  TBLPROPERTIES ('kudu.table_name'='kudu_tbl_created_via_api');
+]]>
+</codeblock>
+
+    <p>
+      Here is an example of <codeph>CREATE TABLE AS SELECT</codeph> syntax for a Kudu table:
     </p>
 
+<codeblock><![CDATA[
+-- The CTAS statement defines the primary key and partitioning scheme.
+-- The rest of the column definitions are derived from the select list.
+CREATE TABLE ctas_t1
+  PRIMARY KEY (id) PARTITION BY HASH (id) PARTITIONS 10
+  STORED AS KUDU
+  AS SELECT id, s FROM kudu_t1;
+]]>
+</codeblock>
+
+<!-- This clause restriction and other syntax nuances come from this blog post:
+     http://blog.cloudera.com/blog/2015/11/how-to-use-impala-with-kudu/
+     which has some details and examples not found in the actual Kudu docs. -->
+
     <p rev="kudu">
+      The following <codeph>CREATE TABLE</codeph> clauses are not supported for Kudu tables:
+    </p>
+
+    <ul rev="kudu">
+      <li>
+        <codeph>PARTITIONED BY</codeph> (Kudu tables use the clause <codeph>PARTITION
+        BY</codeph> instead)
+      </li>
+
+      <li>
+        <codeph>LOCATION</codeph>
+      </li>
+
+      <li>
+        <codeph>ROWFORMAT</codeph>
+      </li>
+
+      <li>
+        <codeph>CACHED IN | UNCACHED</codeph>
+      </li>
+
+      <li>
+        <codeph>WITH SERDEPROPERTIES</codeph>
+      </li>
+    </ul>
+
+    <p rev="IMPALA-3719">
+      For more on the <codeph>PRIMARY KEY</codeph> clause, see
+      <xref href="impala_kudu.xml#kudu_primary_key"/> and
+      <xref href="impala_kudu.xml#kudu_primary_key_attribute"/>.
+    </p>
+
+    <p rev="IMPALA-3719">
+      For more on the <codeph>NULL</codeph> and <codeph>NOT NULL</codeph> attributes, see
+      <xref href="impala_kudu.xml#kudu_not_null_attribute"/>.
+    </p>
+
+    <p rev="IMPALA-3719">
+      For more on the <codeph>ENCODING</codeph> attribute, see
+      <xref href="impala_kudu.xml#kudu_encoding_attribute"/>.
+    </p>
+
+    <p rev="IMPALA-3719">
+      For more on the <codeph>COMPRESSION</codeph> attribute, see
+      <xref href="impala_kudu.xml#kudu_compression_attribute"/>.
+    </p>
+
+    <p rev="IMPALA-3719">
+      For more on the <codeph>DEFAULT</codeph> attribute, see
+      <xref href="impala_kudu.xml#kudu_default_attribute"/>.
+    </p>
+
+    <p rev="IMPALA-3719">
+      For more on the <codeph>BLOCK_SIZE</codeph> attribute, see
+      <xref href="impala_kudu.xml#kudu_block_size_attribute"/>.
+    </p>
+
+    <p rev="kudu IMPALA-3719">
+      <b>Partitioning for Kudu tables (PARTITION BY clause)</b>
+    </p>
+
+    <p rev="kudu IMPALA-3719">
       For Kudu tables, you specify logical partitioning across one or more columns using the
       <codeph>PARTITION BY</codeph> clause. In contrast to partitioning for HDFS-based tables,
-      multiple values for a partition key column can be located in the same partition.
-      The optional <codeph>HASH</codeph> clause lets you divide one or a set of partition key columns
-      into a specified number of partitions; you can use more than one <codeph>HASH</codeph>
-      clause, specifying a distinct set of partition key columns for each.
-      The optional <codeph>RANGE</codeph> clause further subdivides the partitions, based on
-      a range of values for the partition key columns.
+      multiple values for a partition key column can be located in the same partition. The
+      optional <codeph>HASH</codeph> clause lets you divide one or a set of partition key
+      columns into a specified number of buckets. You can use more than one
+      <codeph>HASH</codeph> clause, specifying a distinct set of partition key columns for each.
+      The optional <codeph>RANGE</codeph> clause further subdivides the partitions, based on a
+      set of comparison operations for the partition key columns.
+    </p>
+
+    <p rev="kudu IMPALA-3719">
+      Here are some examples of the <codeph>PARTITION BY HASH</codeph> syntax:
+    </p>
+
+<codeblock rev="kudu IMPALA-3719"><![CDATA[
+-- Apply hash function to 1 primary key column.
+create table hash_t1 (x bigint, y bigint, s string, primary key (x,y))
+  partition by hash (x) partitions 10
+  stored as kudu;
+
+-- Apply hash function to a different primary key column.
+create table hash_t2 (x bigint, y bigint, s string, primary key (x,y))
+  partition by hash (y) partitions 10
+  stored as kudu;
+
+-- Apply hash function to both primary key columns.
+-- In this case, the total number of partitions is 10.
+create table hash_t3 (x bigint, y bigint, s string, primary key (x,y))
+  partition by hash (x,y) partitions 10
+  stored as kudu;
+
+-- When the column list is omitted, apply hash function to all primary key columns.
+create table hash_t4 (x bigint, y bigint, s string, primary key (x,y))
+  partition by hash partitions 10
+  stored as kudu;
+
+-- Hash the X values independently from the Y values.
+-- In this case, the total number of partitions is 10 x 20.
+create table hash_t5 (x bigint, y bigint, s string, primary key (x,y))
+  partition by hash (x) partitions 10, hash (y) partitions 20
+  stored as kudu;
+]]>
+</codeblock>
+
+    <p rev="kudu IMPALA-3719">
+      Here are some examples of the <codeph>PARTITION BY RANGE</codeph> syntax:
+    </p>
+
+<codeblock rev="kudu IMPALA-3719"><![CDATA[
+-- Create partitions that cover every possible value of X.
+-- Ranges that span multiple values use the keyword VALUES between
+-- a pair of < and <= comparisons.
+create table range_t1 (x bigint, s string, s2 string, primary key (x, s))
+  partition by range (x)
+    (
+      partition 0 <= values <= 49, partition 50 <= values <= 100,
+      partition values < 0, partition 100 < values
+    )
+  stored as kudu;
+
+-- Create partitions that cover some possible values of X.
+-- Values outside the covered range(s) are rejected.
+-- New range partitions can be added through ALTER TABLE.
+create table range_t2 (x bigint, s string, s2 string, primary key (x, s))
+  partition by range (x)
+    (
+      partition 0 <= values <= 49, partition 50 <= values <= 100
+    )
+  stored as kudu;
+
+-- A range can also specify a single specific value, using the keyword VALUE
+-- with an = comparison.
+create table range_t3 (x bigint, s string, s2 string, primary key (x, s))
+  partition by range (s)
+    (
+      partition value = 'Yes', partition value = 'No', partition value = 'Maybe'
+    )
+  stored as kudu;
+
+-- Using multiple columns in the RANGE clause and tuples inside the partition spec
+-- only works for partitions specified with the VALUE= syntax.
+create table range_t4 (x bigint, s string, s2 string, primary key (x, s))
+  partition by range (x,s)
+    (
+      partition value = (0,'zero'), partition value = (1,'one'), partition value = (2,'two')
+    )
+  stored as kudu;
+]]>
+</codeblock>
+
+    <p rev="kudu IMPALA-3719">
+      Here are some examples combining both <codeph>HASH</codeph> and <codeph>RANGE</codeph>
+      syntax for the <codeph>PARTITION BY</codeph> clause:
+    </p>
+
+<codeblock rev="kudu IMPALA-3719"><![CDATA[
+-- Values from each range partition are hashed into 10 associated buckets.
+-- Total number of partitions in this case is 10 x 2.
+create table combined_t1 (x bigint, s string, s2 string, primary key (x, s))
+  partition by hash (x) partitions 10, range (x)
+    (
+      partition 0 <= values <= 49, partition 50 <= values <= 100
+    )
+  stored as kudu;
+
+-- The hash partitioning and range partitioning can apply to different columns.
+-- But all the columns used in either partitioning scheme must be from the primary key.
+create table combined_t2 (x bigint, s string, s2 string, primary key (x, s))
+  partition by hash (s) partitions 10, range (x)
+    (
+      partition 0 <= values <= 49, partition 50 <= values <= 100
+    )
+  stored as kudu;
+]]>
+</codeblock>
+
+    <p rev="kudu IMPALA-3719">
+      For more usage details and examples of the Kudu partitioning syntax, see
+      <xref keyref="impala_kudu"/>.
     </p>
--->
 
     <p>
       <b>Specifying file format (STORED AS and ROW FORMAT clauses):</b>
     </p>
 
     <p rev="DOCS-1523">
-      The <codeph>STORED AS</codeph> clause identifies the format of the underlying data files. Currently, Impala
-      can query more types of file formats than it can create or insert into. Use Hive to perform any create or
-      data load operations that are not currently available in Impala. For example, Impala can create an Avro,
-      SequenceFile, or RCFile table but cannot insert data into it. There are also Impala-specific procedures for using
-      compression with each kind of file format. For details about working with data files of various formats, see
-      <xref href="impala_file_formats.xml#file_formats"/>.
+      The <codeph>STORED AS</codeph> clause identifies the format of the underlying data files.
+      Currently, Impala can query more types of file formats than it can create or insert into.
+      Use Hive to perform any create or data load operations that are not currently available in
+      Impala. For example, Impala can create an Avro, SequenceFile, or RCFile table but cannot
+      insert data into it. There are also Impala-specific procedures for using compression with
+      each kind of file format. For details about working with data files of various formats,
+      see <xref href="impala_file_formats.xml#file_formats"/>.
     </p>
 
     <note>
-      In Impala 1.4.0 and higher, Impala can create Avro tables, which formerly required doing the <codeph>CREATE
-      TABLE</codeph> statement in Hive. See <xref href="impala_avro.xml#avro"/> for details and examples.
+      In Impala 1.4.0 and higher, Impala can create Avro tables, which formerly required doing
+      the <codeph>CREATE TABLE</codeph> statement in Hive. See
+      <xref href="impala_avro.xml#avro"/> for details and examples.
     </note>
 
     <p>
-      By default (when no <codeph>STORED AS</codeph> clause is specified), data files in Impala tables are created
-      as text files with Ctrl-A (hex 01) characters as the delimiter.
+      By default (when no <codeph>STORED AS</codeph> clause is specified), data files in Impala
+      tables are created as text files with Ctrl-A (hex 01) characters as the delimiter.
 <!-- Verify if ROW FORMAT is entirely ignored outside of text tables, or does it apply somehow to SequenceFile and/or RCFile too? -->
-      Specify the <codeph>ROW FORMAT DELIMITED</codeph> clause to produce or ingest data files that use a different
-      delimiter character such as tab or <codeph>|</codeph>, or a different line end character such as carriage
-      return or newline. When specifying delimiter and line end characters with the <codeph>FIELDS TERMINATED
-      BY</codeph> and <codeph>LINES TERMINATED BY</codeph> clauses, use <codeph>'\t'</codeph> for tab,
-      <codeph>'\n'</codeph> for newline or linefeed, <codeph>'\r'</codeph> for carriage return, and
-      <codeph>\</codeph><codeph>0</codeph> for ASCII <codeph>nul</codeph> (hex 00). For more examples of text
-      tables, see <xref href="impala_txtfile.xml#txtfile"/>.
+      Specify the <codeph>ROW FORMAT DELIMITED</codeph> clause to produce or ingest data files
+      that use a different delimiter character such as tab or <codeph>|</codeph>, or a different
+      line end character such as carriage return or newline. When specifying delimiter and line
+      end characters with the <codeph>FIELDS TERMINATED BY</codeph> and <codeph>LINES TERMINATED
+      BY</codeph> clauses, use <codeph>'\t'</codeph> for tab, <codeph>'\n'</codeph> for newline
+      or linefeed, <codeph>'\r'</codeph> for carriage return, and
+      <codeph>\</codeph><codeph>0</codeph> for ASCII <codeph>nul</codeph> (hex 00). For more
+      examples of text tables, see <xref href="impala_txtfile.xml#txtfile"/>.
     </p>
 
     <p>
-      The <codeph>ESCAPED BY</codeph> clause applies both to text files that you create through an
-      <codeph>INSERT</codeph> statement to an Impala <codeph>TEXTFILE</codeph> table, and to existing data files
-      that you put into an Impala table directory. (You can ingest existing data files either by creating the table
-      with <codeph>CREATE EXTERNAL TABLE ... LOCATION</codeph>, the <codeph>LOAD DATA</codeph> statement, or
-      through an HDFS operation such as <codeph>hdfs dfs -put <varname>file</varname>
-      <varname>hdfs_path</varname></codeph>.) Choose an escape character that is not used anywhere else in the
-      file, and put it in front of each instance of the delimiter character that occurs within a field value.
-      Surrounding field values with quotation marks does not help Impala to parse fields with embedded delimiter
-      characters; the quotation marks are considered to be part of the column value. If you want to use
-      <codeph>\</codeph> as the escape character, specify the clause in <cmdname>impala-shell</cmdname> as
-      <codeph>ESCAPED BY '\\'</codeph>.
+      The <codeph>ESCAPED BY</codeph> clause applies both to text files that you create through
+      an <codeph>INSERT</codeph> statement to an Impala <codeph>TEXTFILE</codeph> table, and to
+      existing data files that you put into an Impala table directory. (You can ingest existing
+      data files either by creating the table with <codeph>CREATE EXTERNAL TABLE ...
+      LOCATION</codeph>, the <codeph>LOAD DATA</codeph> statement, or through an HDFS operation
+      such as <codeph>hdfs dfs -put <varname>file</varname>
+      <varname>hdfs_path</varname></codeph>.) Choose an escape character that is not used
+      anywhere else in the file, and put it in front of each instance of the delimiter character
+      that occurs within a field value. Surrounding field values with quotation marks does not
+      help Impala to parse fields with embedded delimiter characters; the quotation marks are
+      considered to be part of the column value. If you want to use <codeph>\</codeph> as the
+      escape character, specify the clause in <cmdname>impala-shell</cmdname> as <codeph>ESCAPED
+      BY '\\'</codeph>.
     </p>
 
     <note conref="../shared/impala_common.xml#common/thorn"/>
@@ -346,10 +684,10 @@ file_format:
     </p>
 
     <p>
-      To create an empty table with the same columns, comments, and other attributes as another table, use the
-      following variation. The <codeph>CREATE TABLE ... LIKE</codeph> form allows a restricted set of clauses,
-      currently only the <codeph>LOCATION</codeph>, <codeph>COMMENT</codeph>, and <codeph>STORED AS</codeph>
-      clauses.
+      To create an empty table with the same columns, comments, and other attributes as another
+      table, use the following variation. The <codeph>CREATE TABLE ... LIKE</codeph> form allows
+      a restricted set of clauses, currently only the <codeph>LOCATION</codeph>,
+      <codeph>COMMENT</codeph>, and <codeph>STORED AS</codeph> clauses.
     </p>
 
 <codeblock>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [<varname>db_name</varname>.]<varname>table_name</varname>
@@ -360,45 +698,49 @@ file_format:
 
     <note rev="1.2.0">
       <p rev="1.2.0">
-        To clone the structure of a table and transfer data into it in a single operation, use the <codeph>CREATE
-        TABLE AS SELECT</codeph> syntax described in the next subsection.
+        To clone the structure of a table and transfer data into it in a single operation, use
+        the <codeph>CREATE TABLE AS SELECT</codeph> syntax described in the next subsection.
       </p>
     </note>
 
     <p>
-      When you clone the structure of an existing table using the <codeph>CREATE TABLE ... LIKE</codeph> syntax,
-      the new table keeps the same file format as the original one, so you only need to specify the <codeph>STORED
-      AS</codeph> clause if you want to use a different file format, or when specifying a view as the original
-      table. (Creating a table <q>like</q> a view produces a text table by default.)
+      When you clone the structure of an existing table using the <codeph>CREATE TABLE ...
+      LIKE</codeph> syntax, the new table keeps the same file format as the original one, so you
+      only need to specify the <codeph>STORED AS</codeph> clause if you want to use a different
+      file format, or when specifying a view as the original table. (Creating a table
+      <q>like</q> a view produces a text table by default.)
     </p>
 
     <p>
-      Although normally Impala cannot create an HBase table directly, Impala can clone the structure of an existing
-      HBase table with the <codeph>CREATE TABLE ... LIKE</codeph> syntax, preserving the file format and metadata
-      from the original table.
+      Although normally Impala cannot create an HBase table directly, Impala can clone the
+      structure of an existing HBase table with the <codeph>CREATE TABLE ... LIKE</codeph>
+      syntax, preserving the file format and metadata from the original table.
     </p>
 
     <p>
-      There are some exceptions to the ability to use <codeph>CREATE TABLE ... LIKE</codeph> with an Avro table.
-      For example, you cannot use this technique for an Avro table that is specified with an Avro schema but no
-      columns. When in doubt, check if a <codeph>CREATE TABLE ... LIKE</codeph> operation works in Hive; if not, it
-      typically will not work in Impala either.
+      There are some exceptions to the ability to use <codeph>CREATE TABLE ... LIKE</codeph>
+      with an Avro table. For example, you cannot use this technique for an Avro table that is
+      specified with an Avro schema but no columns. When in doubt, check if a <codeph>CREATE
+      TABLE ... LIKE</codeph> operation works in Hive; if not, it typically will not work in
+      Impala either.
     </p>
 
     <p>
-      If the original table is partitioned, the new table inherits the same partition key columns. Because the new
-      table is initially empty, it does not inherit the actual partitions that exist in the original one. To create
-      partitions in the new table, insert data or issue <codeph>ALTER TABLE ... ADD PARTITION</codeph> statements.
+      If the original table is partitioned, the new table inherits the same partition key
+      columns. Because the new table is initially empty, it does not inherit the actual
+      partitions that exist in the original one. To create partitions in the new table, insert
+      data or issue <codeph>ALTER TABLE ... ADD PARTITION</codeph> statements.
     </p>
 
     <p conref="../shared/impala_common.xml#common/create_table_like_view"/>
 
     <p>
-      Because <codeph>CREATE TABLE ... LIKE</codeph> only manipulates table metadata, not the physical data of the
-      table, issue <codeph>INSERT INTO TABLE</codeph> statements afterward to copy any data from the original table
-      into the new one, optionally converting the data to a new file format. (For some file formats, Impala can do
-      a <codeph>CREATE TABLE ... LIKE</codeph> to create the table, but Impala cannot insert data in that file
-      format; in these cases, you must load the data in Hive. See
+      Because <codeph>CREATE TABLE ... LIKE</codeph> only manipulates table metadata, not the
+      physical data of the table, issue <codeph>INSERT INTO TABLE</codeph> statements afterward
+      to copy any data from the original table into the new one, optionally converting the data
+      to a new file format. (For some file formats, Impala can do a <codeph>CREATE TABLE ...
+      LIKE</codeph> to create the table, but Impala cannot insert data in that file format; in
+      these cases, you must load the data in Hive. See
       <xref href="impala_file_formats.xml#file_formats"/> for details.)
     </p>
 
@@ -407,16 +749,16 @@ file_format:
     </p>
 
     <p>
-      The <codeph>CREATE TABLE AS SELECT</codeph> syntax is a shorthand notation to create a table based on column
-      definitions from another table, and copy data from the source table to the destination table without issuing
-      any separate <codeph>INSERT</codeph> statement. This idiom is so popular that it has its own acronym,
-      <q>CTAS</q>.
+      The <codeph>CREATE TABLE AS SELECT</codeph> syntax is a shorthand notation to create a
+      table based on column definitions from another table, and copy data from the source table
+      to the destination table without issuing any separate <codeph>INSERT</codeph> statement.
+      This idiom is so popular that it has its own acronym, <q>CTAS</q>.
     </p>
 
     <p>
-      The following examples show how to copy data from a source table <codeph>T1</codeph>
-      to a variety of destinations tables, applying various transformations to the table
-      properties, table layout, or the data itself as part of the operation:
+      The following examples show how to copy data from a source table <codeph>T1</codeph> to a
+      variety of destinations tables, applying various transformations to the table properties,
+      table layout, or the data itself as part of the operation:
     </p>
 
 <codeblock>
@@ -484,34 +826,36 @@ CREATE TABLE synthesized_columns AS SELECT upper(c1) AS all_caps, c2+c3 AS total
 <!-- CREATE TABLE AS <select> now incorporated up higher in the original syntax diagram. -->
 
     <p rev="1.2">
-      See <xref href="impala_select.xml#select"/> for details about query syntax for the <codeph>SELECT</codeph>
-      portion of a <codeph>CREATE TABLE AS SELECT</codeph> statement.
+      See <xref href="impala_select.xml#select"/> for details about query syntax for the
+      <codeph>SELECT</codeph> portion of a <codeph>CREATE TABLE AS SELECT</codeph> statement.
     </p>
 
     <p rev="1.2">
-      The newly created table inherits the column names that you select from the original table, which you can
-      override by specifying column aliases in the query. Any column or table comments from the original table are
-      not carried over to the new table.
+      The newly created table inherits the column names that you select from the original table,
+      which you can override by specifying column aliases in the query. Any column or table
+      comments from the original table are not carried over to the new table.
     </p>
 
     <note rev="DOCS-1523">
-      When using the <codeph>STORED AS</codeph> clause with a <codeph>CREATE TABLE AS SELECT</codeph>
-      statement, the destination table must be a file format that Impala can write to: currently,
-      text or Parquet. You cannot specify an Avro, SequenceFile, or RCFile table as the destination
-      table for a CTAS operation.
+      When using the <codeph>STORED AS</codeph> clause with a <codeph>CREATE TABLE AS
+      SELECT</codeph> statement, the destination table must be a file format that Impala can
+      write to: currently, text or Parquet. You cannot specify an Avro, SequenceFile, or RCFile
+      table as the destination table for a CTAS operation.
     </note>
 
     <p rev="2.5.0">
-      Prior to <keyword keyref="impala25_full"/> you could use a partitioned table
-      as the source and copy data from it, but could not specify any partitioning clauses for the new table.
-      In <keyword keyref="impala25_full"/> and higher, you can now use the <codeph>PARTITIONED BY</codeph> clause with a
-      <codeph>CREATE TABLE AS SELECT</codeph> statement. The following example demonstrates how you can copy
-      data from an unpartitioned table in a <codeph>CREATE TABLE AS SELECT</codeph> operation, creating a new
-      partitioned table in the process. The main syntax consideration is the column order in the <codeph>PARTITIONED BY</codeph>
-      clause and the select list: the partition key columns must be listed last in the select list, in the same
-      order as in the <codeph>PARTITIONED BY</codeph> clause. Therefore, in this case, the column order in the
-      destination table is different from the source table. You also only specify the column names in the
-      <codeph>PARTITIONED BY</codeph> clause, not the data types or column comments.
+      Prior to <keyword keyref="impala25_full"/> you could use a partitioned table as the source
+      and copy data from it, but could not specify any partitioning clauses for the new table.
+      In <keyword keyref="impala25_full"/> and higher, you can now use the <codeph>PARTITIONED
+      BY</codeph> clause with a <codeph>CREATE TABLE AS SELECT</codeph> statement. The following
+      example demonstrates how you can copy data from an unpartitioned table in a <codeph>CREATE
+      TABLE AS SELECT</codeph> operation, creating a new partitioned table in the process. The
+      main syntax consideration is the column order in the <codeph>PARTITIONED BY</codeph>
+      clause and the select list: the partition key columns must be listed last in the select
+      list, in the same order as in the <codeph>PARTITIONED BY</codeph> clause. Therefore, in
+      this case, the column order in the destination table is different from the source table.
+      You also only specify the column names in the <codeph>PARTITIONED BY</codeph> clause, not
+      the data types or column comments.
     </p>
 
 <codeblock rev="2.5.0">
@@ -544,12 +888,11 @@ show partitions partitions_yes;
 +-------+-------+-------+--------+------+...
 </codeblock>
 
-  <p rev="2.5.0">
-    The most convenient layout for partitioned tables is with all the
-    partition key columns at the end. The CTAS <codeph>PARTITIONED BY</codeph> syntax
-    requires that column order in the select list, resulting in that same
-    column order in the destination table.
-  </p>
+    <p rev="2.5.0">
+      The most convenient layout for partitioned tables is with all the partition key columns at
+      the end. The CTAS <codeph>PARTITIONED BY</codeph> syntax requires that column order in the
+      select list, resulting in that same column order in the destination table.
+    </p>
 
 <codeblock rev="2.5.0">
 describe partitions_no;
@@ -574,10 +917,10 @@ describe partitions_yes;
 +-------+----------+---------+
 </codeblock>
 
-  <p rev="2.5.0">
-    Attempting to use a select list with the partition key columns
-    not at the end results in an error due to a column name mismatch:
-  </p>
+    <p rev="2.5.0">
+      Attempting to use a select list with the partition key columns not at the end results in
+      an error due to a column name mismatch:
+    </p>
 
 <codeblock rev="2.5.0">
 -- We expect this CTAS to fail because non-key column S
@@ -588,14 +931,15 @@ ERROR: AnalysisException: Partition column name mismatch: year != month
 </codeblock>
 
     <p rev="1.2">
-      For example, the following statements show how you can clone all the data in a table, or a subset of the
-      columns and/or rows, or reorder columns, rename them, or construct them out of expressions:
+      For example, the following statements show how you can clone all the data in a table, or a
+      subset of the columns and/or rows, or reorder columns, rename them, or construct them out
+      of expressions:
     </p>
 
     <p rev="1.2">
-      As part of a CTAS operation, you can convert the data to any file format that Impala can write (currently,
-      <codeph>TEXTFILE</codeph> and <codeph>PARQUET</codeph>). You cannot specify the lower-level properties of a
-      text table, such as the delimiter.
+      As part of a CTAS operation, you can convert the data to any file format that Impala can
+      write (currently, <codeph>TEXTFILE</codeph> and <codeph>PARQUET</codeph>). You cannot
+      specify the lower-level properties of a text table, such as the delimiter.
     </p>
 
     <p rev="obwl" conref="../shared/impala_common.xml#common/insert_sort_blurb"/>
@@ -605,57 +949,65 @@ ERROR: AnalysisException: Partition column name mismatch: year != month
     </p>
 
     <p rev="1.4.0">
-      The variation <codeph>CREATE TABLE ... LIKE PARQUET '<varname>hdfs_path_of_parquet_file</varname>'</codeph>
-      lets you skip the column definitions of the <codeph>CREATE TABLE</codeph> statement. The column names and
-      data types are automatically configured based on the organization of the specified Parquet data file, which
-      must already reside in HDFS. You can use a data file located outside the Impala database directories, or a
-      file from an existing Impala Parquet table; either way, Impala only uses the column definitions from the file
-      and does not use the HDFS location for the <codeph>LOCATION</codeph> attribute of the new table. (Although
-      you can also specify the enclosing directory with the <codeph>LOCATION</codeph> attribute, to both use the
-      same schema as the data file and point the Impala table at the associated directory for querying.)
+      The variation <codeph>CREATE TABLE ... LIKE PARQUET
+      '<varname>hdfs_path_of_parquet_file</varname>'</codeph> lets you skip the column
+      definitions of the <codeph>CREATE TABLE</codeph> statement. The column names and data
+      types are automatically configured based on the organization of the specified Parquet data
+      file, which must already reside in HDFS. You can use a data file located outside the
+      Impala database directories, or a file from an existing Impala Parquet table; either way,
+      Impala only uses the column definitions from the file and does not use the HDFS location
+      for the <codeph>LOCATION</codeph> attribute of the new table. (Although you can also
+      specify the enclosing directory with the <codeph>LOCATION</codeph> attribute, to both use
+      the same schema as the data file and point the Impala table at the associated directory
+      for querying.)
     </p>
 
     <p rev="1.4.0">
-      The following considerations apply when you use the <codeph>CREATE TABLE LIKE PARQUET</codeph> technique:
+      The following considerations apply when you use the <codeph>CREATE TABLE LIKE
+      PARQUET</codeph> technique:
     </p>
 
     <ul rev="1.4.0">
       <li>
-        Any column comments from the original table are not preserved in the new table. Each column in the new
-        table has a comment stating the low-level Parquet field type used to deduce the appropriate SQL column
-        type.
+        Any column comments from the original table are not preserved in the new table. Each
+        column in the new table has a comment stating the low-level Parquet field type used to
+        deduce the appropriate SQL column type.
       </li>
 
       <li>
-        If you use a data file from a partitioned Impala table, any partition key columns from the original table
-        are left out of the new table, because they are represented in HDFS directory names rather than stored in
-        the data file. To preserve the partition information, repeat the same <codeph>PARTITION</codeph> clause as
-        in the original <codeph>CREATE TABLE</codeph> statement.
+        If you use a data file from a partitioned Impala table, any partition key columns from
+        the original table are left out of the new table, because they are represented in HDFS
+        directory names rather than stored in the data file. To preserve the partition
+        information, repeat the same <codeph>PARTITION</codeph> clause as in the original
+        <codeph>CREATE TABLE</codeph> statement.
       </li>
 
       <li>
-        The file format of the new table defaults to text, as with other kinds of <codeph>CREATE TABLE</codeph>
-        statements. To make the new table also use Parquet format, include the clause <codeph>STORED AS
-        PARQUET</codeph> in the <codeph>CREATE TABLE LIKE PARQUET</codeph> statement.
+        The file format of the new table defaults to text, as with other kinds of <codeph>CREATE
+        TABLE</codeph> statements. To make the new table also use Parquet format, include the
+        clause <codeph>STORED AS PARQUET</codeph> in the <codeph>CREATE TABLE LIKE
+        PARQUET</codeph> statement.
       </li>
 
       <li>
-        If the Parquet data file comes from an existing Impala table, currently, any <codeph>TINYINT</codeph> or
-        <codeph>SMALLINT</codeph> columns are turned into <codeph>INT</codeph> columns in the new table.
-        Internally, Parquet stores such values as 32-bit integers.
+        If the Parquet data file comes from an existing Impala table, currently, any
+        <codeph>TINYINT</codeph> or <codeph>SMALLINT</codeph> columns are turned into
+        <codeph>INT</codeph> columns in the new table. Internally, Parquet stores such values as
+        32-bit integers.
       </li>
 
       <li>
-        When the destination table uses the Parquet file format, the <codeph>CREATE TABLE AS SELECT</codeph> and
-        <codeph>INSERT ... SELECT</codeph> statements always create at least one data file, even if the
-        <codeph>SELECT</codeph> part of the statement does not match any rows. You can use such an empty Parquet
-        data file as a template for subsequent <codeph>CREATE TABLE LIKE PARQUET</codeph> statements.
+        When the destination table uses the Parquet file format, the <codeph>CREATE TABLE AS
+        SELECT</codeph> and <codeph>INSERT ... SELECT</codeph> statements always create at least
+        one data file, even if the <codeph>SELECT</codeph> part of the statement does not match
+        any rows. You can use such an empty Parquet data file as a template for subsequent
+        <codeph>CREATE TABLE LIKE PARQUET</codeph> statements.
       </li>
     </ul>
 
     <p>
-      For more details about creating Parquet tables, and examples of the <codeph>CREATE TABLE LIKE
-      PARQUET</codeph> syntax, see <xref href="impala_parquet.xml#parquet"/>.
+      For more details about creating Parquet tables, and examples of the <codeph>CREATE TABLE
+      LIKE PARQUET</codeph> syntax, see <xref href="impala_parquet.xml#parquet"/>.
     </p>
 
     <p>
@@ -663,25 +1015,28 @@ ERROR: AnalysisException: Partition column name mismatch: year != month
     </p>
 
     <p rev="1.2">
-      You can associate arbitrary items of metadata with a table by specifying the <codeph>TBLPROPERTIES</codeph>
-      clause. This clause takes a comma-separated list of key-value pairs and stores those items in the metastore
-      database. You can also change the table properties later with an <codeph>ALTER TABLE</codeph> statement. You
-      can observe the table properties for different delimiter and escape characters using the <codeph>DESCRIBE
-      FORMATTED</codeph> command, and change those settings for an existing table with <codeph>ALTER TABLE ... SET
-      TBLPROPERTIES</codeph>.
+      You can associate arbitrary items of metadata with a table by specifying the
+      <codeph>TBLPROPERTIES</codeph> clause. This clause takes a comma-separated list of
+      key-value pairs and stores those items in the metastore database. You can also change the
+      table properties later with an <codeph>ALTER TABLE</codeph> statement. You can observe the
+      table properties for different delimiter and escape characters using the <codeph>DESCRIBE
+      FORMATTED</codeph> command, and change those settings for an existing table with
+      <codeph>ALTER TABLE ... SET TBLPROPERTIES</codeph>.
     </p>
 
     <p rev="1.2">
-      You can also associate SerDes properties with the table by specifying key-value pairs through the
-      <codeph>WITH SERDEPROPERTIES</codeph> clause. This metadata is not used by Impala, which has its own built-in
-      serializer and deserializer for the file formats it supports. Particular property values might be needed for
-      Hive compatibility with certain variations of file formats, particularly Avro.
+      You can also associate SerDes properties with the table by specifying key-value pairs
+      through the <codeph>WITH SERDEPROPERTIES</codeph> clause. This metadata is not used by
+      Impala, which has its own built-in serializer and deserializer for the file formats it
+      supports. Particular property values might be needed for Hive compatibility with certain
+      variations of file formats, particularly Avro.
     </p>
 
     <p>
-      Some DDL operations that interact with other Hadoop components require specifying particular values in the
-      <codeph>SERDEPROPERTIES</codeph> or <codeph>TBLPROPERTIES</codeph> fields, such as creating an Avro table or
-      an HBase table. (You typically create HBase tables in Hive, because they require additional clauses not
+      Some DDL operations that interact with other Hadoop components require specifying
+      particular values in the <codeph>SERDEPROPERTIES</codeph> or
+      <codeph>TBLPROPERTIES</codeph> fields, such as creating an Avro table or an HBase table.
+      (You typically create HBase tables in Hive, because they require additional clauses not
       currently available in Impala.)
 <!-- Haven't got a working example from Lenni, so suppressing this recommendation for now.
         The Avro schema properties can be specified through either
@@ -692,21 +1047,23 @@ ERROR: AnalysisException: Partition column name mismatch: year != month
     </p>
 
     <p>
-      To see the column definitions and column comments for an existing table, for example before issuing a
-      <codeph>CREATE TABLE ... LIKE</codeph> or a <codeph>CREATE TABLE ... AS SELECT</codeph> statement, issue the
-      statement <codeph>DESCRIBE <varname>table_name</varname></codeph>. To see even more detail, such as the
-      location of data files and the values for clauses such as <codeph>ROW FORMAT</codeph> and <codeph>STORED
-      AS</codeph>, issue the statement <codeph>DESCRIBE FORMATTED <varname>table_name</varname></codeph>.
-      <codeph>DESCRIBE FORMATTED</codeph> is also needed to see any overall table comment (as opposed to individual
-      column comments).
+      To see the column definitions and column comments for an existing table, for example
+      before issuing a <codeph>CREATE TABLE ... LIKE</codeph> or a <codeph>CREATE TABLE ... AS
+      SELECT</codeph> statement, issue the statement <codeph>DESCRIBE
+      <varname>table_name</varname></codeph>. To see even more detail, such as the location of
+      data files and the values for clauses such as <codeph>ROW FORMAT</codeph> and
+      <codeph>STORED AS</codeph>, issue the statement <codeph>DESCRIBE FORMATTED
+      <varname>table_name</varname></codeph>. <codeph>DESCRIBE FORMATTED</codeph> is also needed
+      to see any overall table comment (as opposed to individual column comments).
     </p>
 
     <p>
       After creating a table, your <cmdname>impala-shell</cmdname> session or another
-      <cmdname>impala-shell</cmdname> connected to the same node can immediately query that table. There might be a
-      brief interval (one statestore heartbeat) before the table can be queried through a different Impala node. To
-      make the <codeph>CREATE TABLE</codeph> statement return only when the table is recognized by all Impala nodes
-      in the cluster, enable the <codeph>SYNC_DDL</codeph> query option.
+      <cmdname>impala-shell</cmdname> connected to the same node can immediately query that
+      table. There might be a brief interval (one statestore heartbeat) before the table can be
+      queried through a different Impala node. To make the <codeph>CREATE TABLE</codeph>
+      statement return only when the table is recognized by all Impala nodes in the cluster,
+      enable the <codeph>SYNC_DDL</codeph> query option.
     </p>
 
     <p rev="1.4.0">
@@ -714,9 +1071,10 @@ ERROR: AnalysisException: Partition column name mismatch: year != month
     </p>
 
     <p rev="1.4.0">
-      If you specify the <codeph>CACHED IN</codeph> clause, any existing or future data files in the table
-      directory or the partition subdirectories are designated to be loaded into memory with the HDFS caching
-      mechanism. See <xref href="impala_perf_hdfs_caching.xml#hdfs_caching"/> for details about using the HDFS
+      If you specify the <codeph>CACHED IN</codeph> clause, any existing or future data files in
+      the table directory or the partition subdirectories are designated to be loaded into
+      memory with the HDFS caching mechanism. See
+      <xref href="impala_perf_hdfs_caching.xml#hdfs_caching"/> for details about using the HDFS
       caching feature.
     </p>
 
@@ -729,42 +1087,44 @@ ERROR: AnalysisException: Partition column name mismatch: year != month
     </p>
 
     <p>
-      If you intend to use the table to hold data files produced by some external source, specify the columns in
-      the same order as they appear in the data files.
+      If you intend to use the table to hold data files produced by some external source,
+      specify the columns in the same order as they appear in the data files.
     </p>
 
     <p>
-      If you intend to insert or copy data into the table through Impala, or if you have control over the way
-      externally produced data files are arranged, use your judgment to specify columns in the most convenient
-      order:
+      If you intend to insert or copy data into the table through Impala, or if you have control
+      over the way externally produced data files are arranged, use your judgment to specify
+      columns in the most convenient order:
     </p>
 
     <ul>
       <li>
         <p>
-          If certain columns are often <codeph>NULL</codeph>, specify those columns last. You might produce data
-          files that omit these trailing columns entirely. Impala automatically fills in the <codeph>NULL</codeph>
-          values if so.
+          If certain columns are often <codeph>NULL</codeph>, specify those columns last. You
+          might produce data files that omit these trailing columns entirely. Impala
+          automatically fills in the <codeph>NULL</codeph> values if so.
         </p>
       </li>
 
       <li>
         <p>
-          If an unpartitioned table will be used as the source for an <codeph>INSERT ... SELECT</codeph> operation
-          into a partitioned table, specify last in the unpartitioned table any columns that correspond to
-          partition key columns in the partitioned table, and in the same order as the partition key columns are
-          declared in the partitioned table. This technique lets you use <codeph>INSERT ... SELECT *</codeph> when
-          copying data to the partitioned table, rather than specifying each column name individually.
+          If an unpartitioned table will be used as the source for an <codeph>INSERT ...
+          SELECT</codeph> operation into a partitioned table, specify last in the unpartitioned
+          table any columns that correspond to partition key columns in the partitioned table,
+          and in the same order as the partition key columns are declared in the partitioned
+          table. This technique lets you use <codeph>INSERT ... SELECT *</codeph> when copying
+          data to the partitioned table, rather than specifying each column name individually.
         </p>
       </li>
 
       <li>
         <p>
-          If you specify columns in an order that you later discover is suboptimal, you can sometimes work around
-          the problem without recreating the table. You can create a view that selects columns from the original
-          table in a permuted order, then do a <codeph>SELECT *</codeph> from the view. When inserting data into a
-          table, you can specify a permuted order for the inserted columns to match the order in the destination
-          table.
+          If you specify columns in an order that you later discover is suboptimal, you can
+          sometimes work around the problem without recreating the table. You can create a view
+          that selects columns from the original table in a permuted order, then do a
+          <codeph>SELECT *</codeph> from the view. When inserting data into a table, you can
+          specify a permuted order for the inserted columns to match the order in the
+          destination table.
         </p>
       </li>
     </ul>
@@ -772,34 +1132,37 @@ ERROR: AnalysisException: Partition column name mismatch: year != month
     <p conref="../shared/impala_common.xml#common/hive_blurb"/>
 
     <p>
-      Impala queries can make use of metadata about the table and columns, such as the number of rows in a table or
-      the number of different values in a column. Prior to Impala 1.2.2, to create this metadata, you issued the
-      <codeph>ANALYZE TABLE</codeph> statement in Hive to gather this information, after creating the table and
-      loading representative data into it. In Impala 1.2.2 and higher, the <codeph>COMPUTE STATS</codeph> statement
-      produces these statistics within Impala, without needing to use Hive at all.
+      Impala queries can make use of metadata about the table and columns, such as the number of
+      rows in a table or the number of different values in a column. Prior to Impala 1.2.2, to
+      create this metadata, you issued the <codeph>ANALYZE TABLE</codeph> statement in Hive to
+      gather this information, after creating the table and loading representative data into it.
+      In Impala 1.2.2 and higher, the <codeph>COMPUTE STATS</codeph> statement produces these
+      statistics within Impala, without needing to use Hive at all.
     </p>
 
     <p conref="../shared/impala_common.xml#common/hbase_blurb"/>
 
     <note>
       <p>
-        The Impala <codeph>CREATE TABLE</codeph> statement cannot create an HBase table, because it currently does
-        not support the <codeph>STORED BY</codeph> clause needed for HBase tables. Create such tables in Hive, then
-        query them through Impala. For information on using Impala with HBase tables, see
-        <xref href="impala_hbase.xml#impala_hbase"/>.
+        The Impala <codeph>CREATE TABLE</codeph> statement cannot create an HBase table, because
+        it currently does not support the <codeph>STORED BY</codeph> clause needed for HBase
+        tables. Create such tables in Hive, then query them through Impala. For information on
+        using Impala with HBase tables, see <xref href="impala_hbase.xml#impala_hbase"/>.
       </p>
     </note>
 
     <p conref="../shared/impala_common.xml#common/s3_blurb"/>
+
     <p rev="2.2.0">
       To create a table where the data resides in the Amazon Simple Storage Service (S3),
-      specify a <codeph>s3a://</codeph> prefix <codeph>LOCATION</codeph> attribute pointing to the data files in S3.
+      specify a <codeph>s3a://</codeph> prefix <codeph>LOCATION</codeph> attribute pointing to
+      the data files in S3.
     </p>
 
     <p rev="2.6.0 CDH-39913 IMPALA-1878">
-      In <keyword keyref="impala26_full"/> and higher, you can
-      use this special <codeph>LOCATION</codeph> syntax
-      as part of a <codeph>CREATE TABLE AS SELECT</codeph> statement.
+      In <keyword keyref="impala26_full"/> and higher, you can use this special
+      <codeph>LOCATION</codeph> syntax as part of a <codeph>CREATE TABLE AS SELECT</codeph>
+      statement.
     </p>
 
     <p conref="../shared/impala_common.xml#common/s3_ddl"/>
@@ -809,29 +1172,32 @@ ERROR: AnalysisException: Partition column name mismatch: year != month
     <p conref="../shared/impala_common.xml#common/hdfs_blurb"/>
 
     <p>
-      The <codeph>CREATE TABLE</codeph> statement for an internal table creates a directory in HDFS. The
-      <codeph>CREATE EXTERNAL TABLE</codeph> statement associates the table with an existing HDFS directory, and
-      does not create any new directory in HDFS. To locate the HDFS data directory for a table, issue a
-      <codeph>DESCRIBE FORMATTED <varname>table</varname></codeph> statement. To examine the contents of that HDFS
-      directory, use an OS command such as <codeph>hdfs dfs -ls hdfs://<varname>path</varname></codeph>, either
-      from the OS command line or through the <codeph>shell</codeph> or <codeph>!</codeph> commands in
-      <cmdname>impala-shell</cmdname>.
+      The <codeph>CREATE TABLE</codeph> statement for an internal table creates a directory in
+      HDFS. The <codeph>CREATE EXTERNAL TABLE</codeph> statement associates the table with an
+      existing HDFS directory, and does not create any new directory in HDFS. To locate the HDFS
+      data directory for a table, issue a <codeph>DESCRIBE FORMATTED
+      <varname>table</varname></codeph> statement. To examine the contents of that HDFS
+      directory, use an OS command such as <codeph>hdfs dfs -ls
+      hdfs://<varname>path</varname></codeph>, either from the OS command line or through the
+      <codeph>shell</codeph> or <codeph>!</codeph> commands in <cmdname>impala-shell</cmdname>.
     </p>
 
     <p>
-      The <codeph>CREATE TABLE AS SELECT</codeph> syntax creates data files under the table data directory to hold
-      any data copied by the <codeph>INSERT</codeph> portion of the statement. (Even if no data is copied, Impala
-      might create one or more empty data files.)
+      The <codeph>CREATE TABLE AS SELECT</codeph> syntax creates data files under the table data
+      directory to hold any data copied by the <codeph>INSERT</codeph> portion of the statement.
+      (Even if no data is copied, Impala might create one or more empty data files.)
     </p>
 
     <p conref="../shared/impala_common.xml#common/permissions_blurb"/>
+
     <p rev="CDH-19187">
-      The user ID that the <cmdname>impalad</cmdname> daemon runs under,
-      typically the <codeph>impala</codeph> user, must have both execute and write
-      permission for the database directory where the table is being created.
+      The user ID that the <cmdname>impalad</cmdname> daemon runs under, typically the
+      <codeph>impala</codeph> user, must have both execute and write permission for the database
+      directory where the table is being created.
     </p>
 
     <p conref="../shared/impala_common.xml#common/security_blurb"/>
+
     <p conref="../shared/impala_common.xml#common/redaction_yes"/>
 
     <p conref="../shared/impala_common.xml#common/cancel_blurb_maybe"/>
@@ -840,11 +1206,17 @@ ERROR: AnalysisException: Partition column name mismatch: year != month
 
     <p>
       <xref href="impala_tables.xml#tables"/>,
-      <xref href="impala_alter_table.xml#alter_table"/>, <xref href="impala_drop_table.xml#drop_table"/>,
-      <xref href="impala_partitioning.xml#partitioning"/>, <xref href="impala_tables.xml#internal_tables"/>,
-      <xref href="impala_tables.xml#external_tables"/>, <xref href="impala_compute_stats.xml#compute_stats"/>,
+      <xref href="impala_alter_table.xml#alter_table"/>,
+      <xref href="impala_drop_table.xml#drop_table"/>,
+      <xref href="impala_partitioning.xml#partitioning"/>,
+      <xref href="impala_tables.xml#internal_tables"/>,
+      <xref href="impala_tables.xml#external_tables"/>,
+      <xref href="impala_compute_stats.xml#compute_stats"/>,
       <xref href="impala_sync_ddl.xml#sync_ddl"/>, <xref href="impala_show.xml#show_tables"/>,
-      <xref href="impala_show.xml#show_create_table"/>, <xref href="impala_describe.xml#describe"/>
+      <xref href="impala_show.xml#show_create_table"/>,
+      <xref href="impala_describe.xml#describe"/>
     </p>
+
   </conbody>
+
 </concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_decimal.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_decimal.xml b/docs/topics/impala_decimal.xml
index 0175911..36b149d 100644
--- a/docs/topics/impala_decimal.xml
+++ b/docs/topics/impala_decimal.xml
@@ -822,6 +822,9 @@ SELECT CAST(1000.5 AS DECIMAL);
 
     <p conref="../shared/impala_common.xml#common/column_stats_constant"/>
 
+    <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+    <p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
+
     <p conref="../shared/impala_common.xml#common/related_info"/>
 
     <p>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_describe.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_describe.xml b/docs/topics/impala_describe.xml
index 2548d44..1f7ce9b 100644
--- a/docs/topics/impala_describe.xml
+++ b/docs/topics/impala_describe.xml
@@ -697,6 +697,91 @@ Returned 27 row(s) in 0.17s</codeblock>
       in an arbitrary HDFS directory based on its <codeph>LOCATION</codeph> attribute.)
     </p>
 
+    <p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
+
+    <p rev="kudu">
+      The information displayed for Kudu tables includes the additional attributes
+      that are only applicable for Kudu tables:
+    </p>
+    <ul rev="kudu">
+      <li>
+        Whether or not the column is part of the primary key. Every Kudu table
+        has a <codeph>true</codeph> value here for at least one column. There
+        could be multiple <codeph>true</codeph> values, for tables with
+        composite primary keys.
+      </li>
+      <li>
+        Whether or not the column is nullable. Specified by the <codeph>NULL</codeph>
+        or <codeph>NOT NULL</codeph> attributes on the <codeph>CREATE TABLE</codeph> statement.
+        Columns that are part of the primary key are automatically non-nullable.
+      </li>
+      <li>
+        The default value, if any, for the column. Specified by the <codeph>DEFAULT</codeph>
+        attribute on the <codeph>CREATE TABLE</codeph> statement. If the default value is
+        <codeph>NULL</codeph>, that is not indicated in this column. It is implied by
+        <codeph>nullable</codeph> being true and no other default value specified.
+      </li>
+      <li>
+        The encoding used for values in the column. Specified by the <codeph>ENCODING</codeph>
+        attribute on the <codeph>CREATE TABLE</codeph> statement.
+      </li>
+      <li>
+        The compression used for values in the column. Specified by the <codeph>COMPRESSION</codeph>
+        attribute on the <codeph>CREATE TABLE</codeph> statement.
+      </li>
+      <li>
+        The block size (in bytes) used for the underlying Kudu storage layer for the column.
+        Specified by the <codeph>BLOCK_SIZE</codeph> attribute on the <codeph>CREATE TABLE</codeph>
+        statement.
+      </li>
+    </ul>
+
+    <p rev="kudu">
+      The following example shows <codeph>DESCRIBE</codeph> output for a simple Kudu table, with 
+      a single-column primary key and all column attributes left with their default values:
+    </p>
+
+<codeblock rev="kudu">
+describe million_rows;
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| name | type   | comment | primary_key | nullable | default_value | encoding      | compression         | block_size |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| id   | string |         | true        | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| s    | string |         | false       | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+</codeblock>
+
+    <p rev="kudu">
+      The following example shows <codeph>DESCRIBE</codeph> output for a Kudu table with a
+      two-column primary key, and Kudu-specific attributes applied to some columns:
+    </p>
+
+<codeblock rev="kudu">
+create table kudu_describe_example
+(
+  c1 int, c2 int,
+  c3 string, c4 string not null, c5 string default 'n/a', c6 string default '',
+  c7 bigint not null, c8 bigint null default null, c9 bigint default -1 encoding bit_shuffle,
+  primary key(c1,c2)
+)
+partition by hash (c1, c2) partitions 10 stored as kudu;
+
+describe kudu_describe_example;
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| name | type   | comment | primary_key | nullable | default_value | encoding      | compression         | block_size |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| c1   | int    |         | true        | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c2   | int    |         | true        | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c3   | string |         | false       | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c4   | string |         | false       | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c5   | string |         | false       | true     | n/a           | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c6   | string |         | false       | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c7   | bigint |         | false       | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c8   | bigint |         | false       | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c9   | bigint |         | false       | true     | -1            | BIT_SHUFFLE   | DEFAULT_COMPRESSION | 0          |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+</codeblock>
+
     <p conref="../shared/impala_common.xml#common/related_info"/>
 
     <p>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_double.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_double.xml b/docs/topics/impala_double.xml
index d229fd4..0dfa137 100644
--- a/docs/topics/impala_double.xml
+++ b/docs/topics/impala_double.xml
@@ -108,6 +108,9 @@ SELECT CAST(1000.5 AS DOUBLE);
 
     <p conref="../shared/impala_common.xml#common/float_double_decimal_caveat"/>
 
+    <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+    <p conref="../shared/impala_common.xml#common/kudu_non_pk_data_type"/>
+
     <p conref="../shared/impala_common.xml#common/related_info"/>
 
     <p>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_drop_table.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_drop_table.xml b/docs/topics/impala_drop_table.xml
index b89368b..318393c 100644
--- a/docs/topics/impala_drop_table.xml
+++ b/docs/topics/impala_drop_table.xml
@@ -155,6 +155,15 @@ drop table temporary.trivial;</codeblock>
       no particular permissions are needed for the associated HDFS files or directories.
     </p>
 
+    <p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
+    <p rev="kudu">
+      Kudu tables can be managed or external, the same as with HDFS-based
+      tables. For a managed table, the underlying Kudu table and its data
+      are removed by <codeph>DROP TABLE</codeph>. For an external table,
+      the underlying Kudu table and its data remain after a
+      <codeph>DROP TABLE</codeph>.
+    </p>
+
     <p conref="../shared/impala_common.xml#common/related_info"/>
 
     <p>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_explain.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_explain.xml b/docs/topics/impala_explain.xml
index 4c41fdb..7c4d444 100644
--- a/docs/topics/impala_explain.xml
+++ b/docs/topics/impala_explain.xml
@@ -234,6 +234,42 @@ EXPLAIN_LEVEL set to extended
       if the source table is partitioned.)
     </p>
 
+    <p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
+    <p>
+      The <codeph>EXPLAIN</codeph> statement displays equivalent plan
+      information for queries against Kudu tables as for queries
+      against HDFS-based tables.
+    </p>
+
+    <p>
+      To see which predicates Impala can <q>push down</q> to Kudu for
+      efficient evaluation, without transmitting unnecessary rows back
+      to Impala, look for the <codeph>kudu predicates</codeph> item in
+      the scan phase of the query. The label <codeph>kudu predicates</codeph>
+      indicates a condition that can be evaluated efficiently on the Kudu
+      side. The label <codeph>predicates</codeph> in a <codeph>SCAN KUDU</codeph>
+      node indicates a condition that is evaluated by Impala.
+      For example, in a table with primary key column <codeph>X</codeph>
+      and non-primary key column <codeph>Y</codeph>, you can see that
+      some operators in the <codeph>WHERE</codeph> clause are evaluated
+      immediately by Kudu and others are evaluated later by Impala:
+<codeblock>
+EXPLAIN SELECT x,y from kudu_table WHERE
+  x = 1 AND x NOT IN (2,3) AND y = 1
+  AND x IS NOT NULL AND x > 0;
++----------------
+| Explain String
++----------------
+...
+| 00:SCAN KUDU [jrussell.hash_only]
+|    predicates: x IS NOT NULL, x NOT IN (2, 3)
+|    kudu predicates: x = 1, x > 0, y = 1
+</codeblock>
+      Only binary predicates and <codeph>IN</codeph> predicates containing
+      literal values that exactly match the types in the Kudu table, and do not
+      require any casting, can be pushed to Kudu.
+    </p>
+
     <p conref="../shared/impala_common.xml#common/related_info"/>
     <p>
       <xref href="impala_select.xml#select"/>,

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_float.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_float.xml b/docs/topics/impala_float.xml
index 00d4f4b..12f87b0 100644
--- a/docs/topics/impala_float.xml
+++ b/docs/topics/impala_float.xml
@@ -102,6 +102,9 @@ SELECT CAST(1000.5 AS FLOAT);
 
     <p conref="../shared/impala_common.xml#common/float_double_decimal_caveat"/>
 
+    <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+    <p conref="../shared/impala_common.xml#common/kudu_non_pk_data_type"/>
+
     <p conref="../shared/impala_common.xml#common/related_info"/>
 
     <p>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_grant.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_grant.xml b/docs/topics/impala_grant.xml
index cdba5e0..f1a0672 100644
--- a/docs/topics/impala_grant.xml
+++ b/docs/topics/impala_grant.xml
@@ -129,6 +129,9 @@ object_type ::= TABLE | DATABASE | SERVER | URI
 
     <p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
 
+    <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+    <p conref="../shared/impala_common.xml#common/kudu_sentry_limitations"/>
+
     <p conref="../shared/impala_common.xml#common/related_info"/>
 
     <p>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/661921b2/docs/topics/impala_invalidate_metadata.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_invalidate_metadata.xml b/docs/topics/impala_invalidate_metadata.xml
index 97dfb09..b316b0d 100644
--- a/docs/topics/impala_invalidate_metadata.xml
+++ b/docs/topics/impala_invalidate_metadata.xml
@@ -241,6 +241,11 @@ ERROR: AnalysisException: Database does not exist: new_db_from_hive
     <p conref="../shared/impala_common.xml#common/s3_metadata"/>
 
     <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+
+    <p rev="kudu" conref="../shared/impala_common.xml#common/kudu_blurb"/>
+    <p conref="../shared/impala_common.xml#common/kudu_metadata_intro"/>
+    <p conref="../shared/impala_common.xml#common/kudu_metadata_details"/>
+
     <p conref="../shared/impala_common.xml#common/related_info"/>
     <p>
       <xref href="impala_hadoop.xml#intro_metastore"/>,

[9/9] incubator-impala git commit: IMPALA-4828: Alter Kudu schema outside Impala may crash on read

Posted by mj...@apache.org.

IMPALA-4828: Alter Kudu schema outside Impala may crash on read

Creating a table in Impala, changing the column schema
outside of Impala, and then reading again in Impala may
result in a crash. Impala may attempt to dereference
pointers that aren't there. This happens if a string column
is dropped and then a new, non string column is added with
the old string column's name.

The Kudu scan token contains the projection schema, and that
is validated when opening the Kudu scanner (with the
exception of KUDU-1881), but the issue is that during
planning, Impala assumes the types/nullability of columns
haven't changed when creating the scan tokens. This is fixed
by adding a check when creating the scan token, and failing
the query if the column types changed. Impala then relies on
the Kudu client to properly validate that the underlying
schema is still represented by the scan token, and that
deserialization will fail if it no longer matches. Test
cases were added for this particular crash scenario, which now
fails during planning as expected. This does not attempt to
validate the Kudu client validation at deserialization time,
though that would be valuable coverage to add in the future.

Columns being removed don't produce a crash; the query fails
gracefully. A test was added for this case.

Columns being added should not affect this scenario, but a
test was added anyway.

Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Reviewed-on: http://gerrit.cloudera.org:8080/5840
Reviewed-by: Matthew Jacobs <mj...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/815c76f9
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/815c76f9
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/815c76f9

Branch: refs/heads/master
Commit: 815c76f9cbbe6585ebed961da506fc54ce2ef4e3
Parents: d845413
Author: Matthew Jacobs <mj...@cloudera.com>
Authored: Fri Jan 27 16:02:49 2017 -0800
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Fri Feb 17 23:09:39 2017 +0000

----------------------------------------------------------------------
 .../org/apache/impala/planner/KuduScanNode.java |  30 ++-
 tests/common/kudu_test_suite.py                 |   7 +
 tests/query_test/test_kudu.py                   | 196 ++++++++++++++++++-
 3 files changed, 227 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/815c76f9/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java b/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
index cdb620c..02506e5 100644
--- a/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
+++ b/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
@@ -35,6 +35,7 @@ import org.apache.impala.analysis.SlotRef;
 import org.apache.impala.analysis.StringLiteral;
 import org.apache.impala.analysis.TupleDescriptor;
 import org.apache.impala.catalog.KuduTable;
+import org.apache.impala.catalog.Type;
 import org.apache.impala.common.ImpalaRuntimeException;
 import org.apache.impala.thrift.TExplainLevel;
 import org.apache.impala.thrift.TKuduScanNode;
@@ -146,11 +147,36 @@ public class KuduScanNode extends ScanNode {
     Schema tableSchema = rpcTable.getSchema();
     for (SlotDescriptor desc: getTupleDesc().getSlots()) {
       String colName = desc.getColumn().getName();
+      Type colType = desc.getColumn().getType();
+      ColumnSchema kuduCol = null;
       try {
-        tableSchema.getColumn(colName);
+        kuduCol = tableSchema.getColumn(colName);
       } catch (Exception e) {
         throw new ImpalaRuntimeException("Column '" + colName + "' not found in kudu " +
-            "table " + rpcTable.getName());
+            "table " + rpcTable.getName() + ". The table metadata in Impala may be " +
+            "outdated and need to be refreshed.");
+      }
+
+      Type kuduColType = KuduUtil.toImpalaType(kuduCol.getType());
+      if (!colType.equals(kuduColType)) {
+        throw new ImpalaRuntimeException("Column '" + colName + "' is type " +
+            kuduColType.toSql() + " but Impala expected " + colType.toSql() +
+            ". The table metadata in Impala may be outdated and need to be refreshed.");
+      }
+
+      if (desc.getIsNullable() != kuduCol.isNullable()) {
+        String expected;
+        String actual;
+        if (desc.getIsNullable()) {
+          expected = "nullable";
+          actual = "not nullable";
+        } else {
+          expected = "not nullable";
+          actual = "nullable";
+        }
+        throw new ImpalaRuntimeException("Column '" + colName + "' is " + actual +
+            " but Impala expected it to be " + expected +
+            ". The table metadata in Impala may be outdated and need to be refreshed.");
       }
     }
   }

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/815c76f9/tests/common/kudu_test_suite.py
----------------------------------------------------------------------
diff --git a/tests/common/kudu_test_suite.py b/tests/common/kudu_test_suite.py
index e9763ed..3e438d0 100644
--- a/tests/common/kudu_test_suite.py
+++ b/tests/common/kudu_test_suite.py
@@ -83,6 +83,13 @@ class KuduTestSuite(ImpalaTestSuite):
     return "".join(choice(string.lowercase) for _ in xrange(10))
 
   @classmethod
+  def to_kudu_table_name(cls, db_name, tbl_name):
+    """Return the name of the underlying Kudu table, from the Impala database and table
+    name. This must be kept in sync with KuduUtil.getDefaultCreateKuduTableName() in the
+    FE."""
+    return "impala::%s.%s" % (db_name, tbl_name)
+
+  @classmethod
   def get_kudu_table_base_name(cls, name):
     return name.split(".")[-1]
 

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/815c76f9/tests/query_test/test_kudu.py
----------------------------------------------------------------------
diff --git a/tests/query_test/test_kudu.py b/tests/query_test/test_kudu.py
index b7235e8..50db7ee 100644
--- a/tests/query_test/test_kudu.py
+++ b/tests/query_test/test_kudu.py
@@ -95,14 +95,202 @@ class TestKuduOperations(KuduTestSuite):
                   encoding, compression, default, blocksize, nullable, encoding,
                   compression, default, blocksize))
               indx = indx + 1
-              kudu_tbl_name = "impala::%s.%s" % (unique_database, impala_tbl_name)
-              assert kudu_client.table_exists(kudu_tbl_name)
+              assert kudu_client.table_exists(
+                  KuduTestSuite.to_kudu_table_name(unique_database, impala_tbl_name))
+
+  def test_kudu_col_changed(self, cursor, kudu_client, unique_database):
+    """Test changing a Kudu column outside of Impala results in a failure on read with
+       outdated metadata (IMPALA-4828)."""
+    cursor.execute("""CREATE TABLE %s.foo (a INT PRIMARY KEY, s STRING)
+        PARTITION BY HASH(a) PARTITIONS 3 STORED AS KUDU""" % unique_database)
+    assert kudu_client.table_exists(
+        KuduTestSuite.to_kudu_table_name(unique_database, "foo"))
+
+    # Force metadata to be loaded on impalads
+    cursor.execute("select * from %s.foo" % (unique_database))
+
+    # Load the table via the Kudu client and change col 's' to be a different type.
+    table = kudu_client.table(KuduTestSuite.to_kudu_table_name(unique_database, "foo"))
+    alterer = kudu_client.new_table_alterer(table)
+    alterer.drop_column("s")
+    table = alterer.alter()
+    alterer = kudu_client.new_table_alterer(table)
+    alterer.add_column("s", "int32")
+    table = alterer.alter()
+
+    # Add some rows
+    session = kudu_client.new_session()
+    for i in range(100):
+      op = table.new_insert((i, i))
+      session.apply(op)
+    session.flush()
+
+    # Scanning should result in an error
+    try:
+      cursor.execute("SELECT * FROM %s.foo" % (unique_database))
+      assert False
+    except Exception as e:
+      expected_error = "Column 's' is type INT but Impala expected STRING. The table "\
+          "metadata in Impala may be outdated and need to be refreshed."
+      assert expected_error in str(e)
+
+    # After a REFRESH the scan should succeed
+    cursor.execute("REFRESH %s.foo" % (unique_database))
+    cursor.execute("SELECT * FROM %s.foo" % (unique_database))
+    assert len(cursor.fetchall()) == 100
+
+  def test_kudu_col_not_null_changed(self, cursor, kudu_client, unique_database):
+    """Test changing a NOT NULL Kudu column outside of Impala results in a failure
+       on read with outdated metadata (IMPALA-4828)."""
+    cursor.execute("""CREATE TABLE %s.foo (a INT PRIMARY KEY, s STRING NOT NULL)
+        PARTITION BY HASH(a) PARTITIONS 3 STORED AS KUDU""" % unique_database)
+    assert kudu_client.table_exists(
+        KuduTestSuite.to_kudu_table_name(unique_database, "foo"))
+
+    # Force metadata to be loaded on impalads
+    cursor.execute("select * from %s.foo" % (unique_database))
+
+    # Load the table via the Kudu client and change col 's' to be a different type.
+    table = kudu_client.table(KuduTestSuite.to_kudu_table_name(unique_database, "foo"))
+    alterer = kudu_client.new_table_alterer(table)
+    alterer.drop_column("s")
+    table = alterer.alter()
+    alterer = kudu_client.new_table_alterer(table)
+    alterer.add_column("s", "string", nullable=True)
+    table = alterer.alter()
+
+    # Add some rows
+    session = kudu_client.new_session()
+    for i in range(100):
+      op = table.new_insert((i, None))
+      session.apply(op)
+    session.flush()
+
+    # Scanning should result in an error
+    try:
+      cursor.execute("SELECT * FROM %s.foo" % (unique_database))
+      assert False
+    except Exception as e:
+      expected_error = "Column 's' is nullable but Impala expected it to be "\
+          "not nullable. The table metadata in Impala may be outdated and need to be "\
+          "refreshed."
+      assert expected_error in str(e)
+
+    # After a REFRESH the scan should succeed
+    cursor.execute("REFRESH %s.foo" % (unique_database))
+    cursor.execute("SELECT * FROM %s.foo" % (unique_database))
+    assert len(cursor.fetchall()) == 100
+
+  def test_kudu_col_null_changed(self, cursor, kudu_client, unique_database):
+    """Test changing a NULL Kudu column outside of Impala results in a failure
+       on read with outdated metadata (IMPALA-4828)."""
+    cursor.execute("""CREATE TABLE %s.foo (a INT PRIMARY KEY, s STRING NULL)
+        PARTITION BY HASH(a) PARTITIONS 3 STORED AS KUDU""" % unique_database)
+    assert kudu_client.table_exists(
+        KuduTestSuite.to_kudu_table_name(unique_database, "foo"))
+
+    # Force metadata to be loaded on impalads
+    cursor.execute("select * from %s.foo" % (unique_database))
+
+    # Load the table via the Kudu client and change col 's' to be a different type.
+    table = kudu_client.table(KuduTestSuite.to_kudu_table_name(unique_database, "foo"))
+    alterer = kudu_client.new_table_alterer(table)
+    alterer.drop_column("s")
+    table = alterer.alter()
+    alterer = kudu_client.new_table_alterer(table)
+    alterer.add_column("s", "string", nullable=False, default="bar")
+    table = alterer.alter()
+
+    # Add some rows
+    session = kudu_client.new_session()
+    for i in range(100):
+      op = table.new_insert((i, None))
+      session.apply(op)
+    session.flush()
+
+    # Scanning should result in an error
+    try:
+      cursor.execute("SELECT * FROM %s.foo" % (unique_database))
+      assert False
+    except Exception as e:
+      expected_error = "Column 's' is not nullable but Impala expected it to be "\
+          "nullable. The table metadata in Impala may be outdated and need to be "\
+          "refreshed."
+      assert expected_error in str(e)
+
+    # After a REFRESH the scan should succeed
+    cursor.execute("REFRESH %s.foo" % (unique_database))
+    cursor.execute("SELECT * FROM %s.foo" % (unique_database))
+    assert len(cursor.fetchall()) == 100
+
+  def test_kudu_col_added(self, cursor, kudu_client, unique_database):
+    """Test adding a Kudu column outside of Impala."""
+    cursor.execute("""CREATE TABLE %s.foo (a INT PRIMARY KEY)
+        PARTITION BY HASH(a) PARTITIONS 3 STORED AS KUDU""" % unique_database)
+    assert kudu_client.table_exists(
+        KuduTestSuite.to_kudu_table_name(unique_database, "foo"))
+
+    # Force metadata to be loaded on impalads
+    cursor.execute("select * from %s.foo" % (unique_database))
+
+    # Load the table via the Kudu client and add a new col
+    table = kudu_client.table(KuduTestSuite.to_kudu_table_name(unique_database, "foo"))
+    alterer = kudu_client.new_table_alterer(table)
+    alterer.add_column("b", "int32")
+    table = alterer.alter()
+
+    # Add some rows
+    session = kudu_client.new_session()
+    op = table.new_insert((0, 0))
+    session.apply(op)
+    session.flush()
+
+    # Only the first col is visible to Impala. Impala will not know about the missing
+    # column, so '*' is expanded to known columns. This doesn't have a separate check
+    # because the query can proceed and checking would need to fetch metadata from the
+    # Kudu master, which is what REFRESH is for.
+    cursor.execute("SELECT * FROM %s.foo" % (unique_database))
+    assert cursor.fetchall() == [(0, )]
+
+    # After a REFRESH both cols should be visible
+    cursor.execute("REFRESH %s.foo" % (unique_database))
+    cursor.execute("SELECT * FROM %s.foo" % (unique_database))
+    assert cursor.fetchall() == [(0, 0)]
+
+  def test_kudu_col_removed(self, cursor, kudu_client, unique_database):
+    """Test removing a Kudu column outside of Impala."""
+    cursor.execute("""CREATE TABLE %s.foo (a INT PRIMARY KEY, s STRING)
+        PARTITION BY HASH(a) PARTITIONS 3 STORED AS KUDU""" % unique_database)
+    assert kudu_client.table_exists(
+        KuduTestSuite.to_kudu_table_name(unique_database, "foo"))
+
+    # Force metadata to be loaded on impalads
+    cursor.execute("select * from %s.foo" % (unique_database))
+    cursor.execute("insert into %s.foo values (0, 'foo')" % (unique_database))
+
+    # Load the table via the Kudu client and change col 's' to be a different type.
+    table = kudu_client.table(KuduTestSuite.to_kudu_table_name(unique_database, "foo"))
+    alterer = kudu_client.new_table_alterer(table)
+    alterer.drop_column("s")
+    table = alterer.alter()
+
+    # Scanning should result in an error
+    try:
+      cursor.execute("SELECT * FROM %s.foo" % (unique_database))
+    except Exception as e:
+      expected_error = "Column 's' not found in kudu table impala::test_kudu_col_removed"
+      assert expected_error in str(e)
+
+    # After a REFRESH the scan should succeed
+    cursor.execute("REFRESH %s.foo" % (unique_database))
+    cursor.execute("SELECT * FROM %s.foo" % (unique_database))
+    assert cursor.fetchall() == [(0, )]
 
   def test_kudu_rename_table(self, cursor, kudu_client, unique_database):
     """Test Kudu table rename"""
     cursor.execute("""CREATE TABLE %s.foo (a INT PRIMARY KEY) PARTITION BY HASH(a)
         PARTITIONS 3 STORED AS KUDU""" % unique_database)
-    kudu_tbl_name = "impala::%s.foo" % unique_database
+    kudu_tbl_name = KuduTestSuite.to_kudu_table_name(unique_database, "foo")
     assert kudu_client.table_exists(kudu_tbl_name)
     new_kudu_tbl_name = "blah"
     cursor.execute("ALTER TABLE %s.foo SET TBLPROPERTIES('kudu.table_name'='%s')" % (
@@ -546,7 +734,7 @@ class TestImpalaKuduIntegration(KuduTestSuite):
     impala_tbl_name = "foo"
     cursor.execute("""CREATE TABLE %s.%s (a INT PRIMARY KEY) PARTITION BY HASH (a)
         PARTITIONS 3 STORED AS KUDU""" % (unique_database, impala_tbl_name))
-    kudu_tbl_name = "impala::%s.%s" % (unique_database, impala_tbl_name)
+    kudu_tbl_name = KuduTestSuite.to_kudu_table_name(unique_database, impala_tbl_name)
     assert kudu_client.table_exists(kudu_tbl_name)
     kudu_client.delete_table(kudu_tbl_name)
     assert not kudu_client.table_exists(kudu_tbl_name)