You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by jr...@apache.org on 2016/07/26 23:05:07 UTC
[15/22] incubator-impala git commit: First try at porting over the
source files necessary for the Impala SQL Reference.
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_create_function.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_create_function.xml b/docs/topics/impala_create_function.xml
new file mode 100644
index 0000000..4140289
--- /dev/null
+++ b/docs/topics/impala_create_function.xml
@@ -0,0 +1,291 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.2" id="create_function">
+
+ <title>CREATE FUNCTION Statement</title>
+ <titlealts><navtitle>CREATE FUNCTION</navtitle></titlealts>
+ <prolog>
+ <metadata>
+ <data name="Category" value="Impala"/>
+ <data name="Category" value="SQL"/>
+ <data name="Category" value="DDL"/>
+ <data name="Category" value="Schemas"/>
+ <data name="Category" value="Impala Functions"/>
+ <data name="Category" value="UDFs"/>
+ </metadata>
+ </prolog>
+
+ <conbody>
+
+ <p>
+ <indexterm audience="Cloudera">CREATE FUNCTION statement</indexterm>
+ Creates a user-defined function (UDF), which you can use to implement custom logic during
+ <codeph>SELECT</codeph> or <codeph>INSERT</codeph> operations.
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+ <p>
+ The syntax is different depending on whether you create a scalar UDF, which is called once for each row and
+ implemented by a single function, or a user-defined aggregate function (UDA), which is implemented by
+ multiple functions that compute intermediate results across sets of rows.
+ </p>
+
+ <p>
+ To create a scalar UDF, issue a <codeph>CREATE FUNCTION</codeph> statement:
+ </p>
+
+<codeblock>CREATE FUNCTION [IF NOT EXISTS] [<varname>db_name</varname>.]<varname>function_name</varname>([<varname>arg_type</varname>[, <varname>arg_type</varname>...])
+ RETURNS <varname>return_type</varname>
+ LOCATION '<varname>hdfs_path</varname>'
+ SYMBOL='<varname>symbol_or_class</varname>'</codeblock>
+
+ <p>
+ To create a UDA, issue a <codeph>CREATE AGGREGATE FUNCTION</codeph> statement:
+ </p>
+
+<codeblock>CREATE [AGGREGATE] FUNCTION [IF NOT EXISTS] [<varname>db_name</varname>.]<varname>function_name</varname>([<varname>arg_type</varname>[, <varname>arg_type</varname>...])
+ RETURNS <varname>return_type</varname>
+ LOCATION '<varname>hdfs_path</varname>'
+ [INIT_FN='<varname>function</varname>]
+ UPDATE_FN='<varname>function</varname>
+ MERGE_FN='<varname>function</varname>
+ [PREPARE_FN='<varname>function</varname>]
+ [CLOSEFN='<varname>function</varname>]
+ <ph rev="2.0.0">[SERIALIZE_FN='<varname>function</varname>]</ph>
+ [FINALIZE_FN='<varname>function</varname>]
+<!-- [INTERMEDIATE <varname>type_spec</varname>] --></codeblock>
+
+ <p conref="../shared/impala_common.xml#common/ddl_blurb"/>
+
+ <p>
+ <b>Varargs notation:</b>
+ </p>
+
+ <p>
+ If the underlying implementation of your function accepts a variable number of arguments:
+ </p>
+
+ <ul>
+ <li>
+ The variable arguments must go last in the argument list.
+ </li>
+
+ <li>
+ The variable arguments must all be of the same type.
+ </li>
+
+ <li>
+ You must include at least one instance of the variable arguments in every function call invoked from SQL.
+ </li>
+
+ <li>
+ You designate the variable portion of the argument list in the <codeph>CREATE FUNCTION</codeph> statement
+ by including <codeph>...</codeph> immediately after the type name of the first variable argument. For
+ example, to create a function that accepts an <codeph>INT</codeph> argument, followed by a
+ <codeph>BOOLEAN</codeph>, followed by one or more <codeph>STRING</codeph> arguments, your <codeph>CREATE
+ FUNCTION</codeph> statement would look like:
+<codeblock>CREATE FUNCTION <varname>func_name</varname> (INT, BOOLEAN, STRING ...)
+ RETURNS <varname>type</varname> LOCATION '<varname>path</varname>' SYMBOL='<varname>entry_point</varname>';
+</codeblock>
+ </li>
+ </ul>
+
+ <p>
+ See <xref href="impala_udf.xml#udf_varargs"/> for how to code the C++ or Java function to accept
+ variable-length argument lists.
+ </p>
+
+ <p>
+ <b>Scalar and aggregate functions:</b>
+ </p>
+
+ <p>
+ The simplest kind of user-defined function returns a single scalar value each time it is called, typically
+ once for each row in the result set. This general kind of function is what is usually meant by UDF.
+ User-defined aggregate functions (UDAs) are a specialized kind of UDF that produce a single value based on
+ the contents of multiple rows. You usually use UDAs in combination with a <codeph>GROUP BY</codeph> clause to
+ condense a large result set into a smaller one, or even a single row summarizing column values across an
+ entire table.
+ </p>
+
+ <p>
+ You create UDAs by using the <codeph>CREATE AGGREGATE FUNCTION</codeph> syntax. The clauses
+ <codeph>INIT_FN</codeph>, <codeph>UPDATE_FN</codeph>, <codeph>MERGE_FN</codeph>,
+ <ph rev="2.0.0"><codeph>SERIALIZE_FN</codeph>,</ph> <codeph>FINALIZE_FN</codeph>, and
+ <codeph>INTERMEDIATE</codeph> only apply when you create a UDA rather than a scalar UDF.
+ </p>
+
+ <p>
+ The <codeph>*_FN</codeph> clauses specify functions to call at different phases of function processing.
+ </p>
+
+ <ul>
+ <li>
+ <b>Initialize:</b> The function you specify with the <codeph>INIT_FN</codeph> clause does any initial
+ setup, such as initializing member variables in internal data structures. This function is often a stub for
+ simple UDAs. You can omit this clause and a default (no-op) function will be used.
+ </li>
+
+ <li>
+ <b>Update:</b> The function you specify with the <codeph>UPDATE_FN</codeph> clause is called once for each
+ row in the original result set, that is, before any <codeph>GROUP BY</codeph> clause is applied. A separate
+ instance of the function is called for each different value returned by the <codeph>GROUP BY</codeph>
+ clause. The final argument passed to this function is a pointer, to which you write an updated value based
+ on its original value and the value of the first argument.
+ </li>
+
+ <li>
+ <b>Merge:</b> The function you specify with the <codeph>MERGE_FN</codeph> clause is called an arbitrary
+ number of times, to combine intermediate values produced by different nodes or different threads as Impala
+ reads and processes data files in parallel. The final argument passed to this function is a pointer, to
+ which you write an updated value based on its original value and the value of the first argument.
+ </li>
+
+ <li rev="2.0.0">
+ <b>Serialize:</b> The function you specify with the <codeph>SERIALIZE_FN</codeph> clause frees memory
+ allocated to intermediate results. It is required if any memory was allocated by the Allocate function in
+ the Init, Update, or Merge functions, or if the intermediate type contains any pointers. See
+ <xref href="https://github.com/cloudera/impala-udf-samples/blob/master/uda-sample.cc" scope="external" format="html">the
+ UDA code samples</xref> for details.
+ </li>
+
+ <li>
+ <b>Finalize:</b> The function you specify with the <codeph>FINALIZE_FN</codeph> clause does any required
+ teardown for resources acquired by your UDF, such as freeing memory, closing file handles if you explicitly
+ opened any files, and so on. This function is often a stub for simple UDAs. You can omit this clause and a
+ default (no-op) function will be used. It is required in UDAs where the final return type is different than
+ the intermediate type. or if any memory was allocated by the Allocate function in the Init, Update, or
+ Merge functions. See
+ <xref href="https://github.com/cloudera/impala-udf-samples/blob/master/uda-sample.cc" scope="external" format="html">the
+ UDA code samples</xref> for details.
+ </li>
+ </ul>
+
+ <p>
+ If you use a consistent naming convention for each of the underlying functions, Impala can automatically
+ determine the names based on the first such clause, so the others are optional.
+ </p>
+
+ <p audience="Cloudera">
+ The <codeph>INTERMEDIATE</codeph> clause specifies the data type of intermediate values passed from the
+ <q>update</q> phase to the <q>merge</q> phase, and from the <q>merge</q> phase to the <q>finalize</q> phase.
+ You can use any of the existing Impala data types, or the special notation
+ <codeph>CHAR(<varname>n</varname>)</codeph> to allocate a scratch area of <varname>n</varname> bytes for the
+ intermediate result. For example, if the different phases of your UDA pass strings to each other but in the
+ end the function returns a <codeph>BIGINT</codeph> value, you would specify <codeph>INTERMEDIATE
+ STRING</codeph>. Likewise, if the different phases of your UDA pass 2 separate <codeph>BIGINT</codeph> values
+ between them (8 bytes each), you would specify <codeph>INTERMEDIATE CHAR(16)</codeph> so that each function
+ could read from and write to a 16-byte buffer.
+ </p>
+
+ <p>
+ For end-to-end examples of UDAs, see <xref href="impala_udf.xml#udfs"/>.
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+ <p conref="../shared/impala_common.xml#common/udfs_no_complex_types"/>
+
+ <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+ <ul>
+ <li>
+ You can write Impala UDFs in either C++ or Java. C++ UDFs are new to Impala, and are the recommended format
+ for high performance utilizing native code. Java-based UDFs are compatible between Impala and Hive, and are
+ most suited to reusing existing Hive UDFs. (Impala can run Java-based Hive UDFs but not Hive UDAs.)
+ </li>
+
+ <li>
+ The body of the UDF is represented by a <codeph>.so</codeph> or <codeph>.jar</codeph> file, which you store
+ in HDFS and the <codeph>CREATE FUNCTION</codeph> statement distributes to each Impala node.
+ </li>
+
+ <li>
+ Impala calls the underlying code during SQL statement evaluation, as many times as needed to process all
+ the rows from the result set. All UDFs are assumed to be deterministic, that is, to always return the same
+ result when passed the same argument values. Impala might or might not skip some invocations of a UDF if
+ the result value is already known from a previous call. Therefore, do not rely on the UDF being called a
+ specific number of times, and do not return different result values based on some external factor such as
+ the current time, a random number function, or an external data source that could be updated while an
+ Impala query is in progress.
+ </li>
+
+ <li>
+ The names of the function arguments in the UDF are not significant, only their number, positions, and data
+ types.
+ </li>
+
+ <li>
+ You can overload the same function name by creating multiple versions of the function, each with a
+ different argument signature. For security reasons, you cannot make a UDF with the same name as any
+ built-in function.
+ </li>
+
+ <li>
+ In the UDF code, you represent the function return result as a <codeph>struct</codeph>. This
+ <codeph>struct</codeph> contains 2 fields. The first field is a <codeph>boolean</codeph> representing
+ whether the value is <codeph>NULL</codeph> or not. (When this field is <codeph>true</codeph>, the return
+ value is interpreted as <codeph>NULL</codeph>.) The second field is the same type as the specified function
+ return type, and holds the return value when the function returns something other than
+ <codeph>NULL</codeph>.
+ </li>
+
+ <li>
+ In the UDF code, you represent the function arguments as an initial pointer to a UDF context structure,
+ followed by references to zero or more <codeph>struct</codeph>s, corresponding to each of the arguments.
+ Each <codeph>struct</codeph> has the same 2 fields as with the return value, a <codeph>boolean</codeph>
+ field representing whether the argument is <codeph>NULL</codeph>, and a field of the appropriate type
+ holding any non-<codeph>NULL</codeph> argument value.
+ </li>
+
+ <li>
+ For sample code and build instructions for UDFs,
+ see <xref href="https://github.com/cloudera/impala/tree/master/be/src/udf_samples" scope="external" format="html">the sample UDFs in the Impala github repo</xref>.
+ </li>
+
+ <li>
+ Because the file representing the body of the UDF is stored in HDFS, it is automatically available to all
+ the Impala nodes. You do not need to manually copy any UDF-related files between servers.
+ </li>
+
+ <li>
+ Because Impala currently does not have any <codeph>ALTER FUNCTION</codeph> statement, if you need to rename
+ a function, move it to a different database, or change its signature or other properties, issue a
+ <codeph>DROP FUNCTION</codeph> statement for the original function followed by a <codeph>CREATE
+ FUNCTION</codeph> with the desired properties.
+ </li>
+
+ <li>
+ Because each UDF is associated with a particular database, either issue a <codeph>USE</codeph> statement
+ before doing any <codeph>CREATE FUNCTION</codeph> statements, or specify the name of the function as
+ <codeph><varname>db_name</varname>.<varname>function_name</varname></codeph>.
+ </li>
+ </ul>
+
+ <p conref="../shared/impala_common.xml#common/sync_ddl_blurb"/>
+
+ <p conref="../shared/impala_common.xml#common/compatibility_blurb"/>
+
+ <p>
+ Impala can run UDFs that were created through Hive, as long as they refer to Impala-compatible data types
+ (not composite or nested column types). Hive can run Java-based UDFs that were created through Impala, but
+ not Impala UDFs written in C++.
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
+
+ <p conref="../shared/impala_common.xml#common/udf_persistence_restriction"/>
+
+ <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+
+ <p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
+
+ <p conref="../shared/impala_common.xml#common/related_info"/>
+
+ <p>
+ <xref href="impala_udf.xml#udfs"/> for more background information, usage instructions, and examples for
+ Impala UDFs; <xref href="impala_drop_function.xml#drop_function"/>
+ </p>
+ </conbody>
+</concept>
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_create_role.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_create_role.xml b/docs/topics/impala_create_role.xml
new file mode 100644
index 0000000..975ce15
--- /dev/null
+++ b/docs/topics/impala_create_role.xml
@@ -0,0 +1,66 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.4.0" id="create_role">
+
+ <title>CREATE ROLE Statement (CDH 5.2 or higher only)</title>
+ <titlealts><navtitle>CREATE ROLE (CDH 5.2 or higher only)</navtitle></titlealts>
+ <prolog>
+ <metadata>
+ <data name="Category" value="Impala"/>
+ <data name="Category" value="DDL"/>
+ <data name="Category" value="SQL"/>
+ <data name="Category" value="Sentry"/>
+ <data name="Category" value="Roles"/>
+ <!-- Consider whether to go deeper into categories like Security for the Sentry-related statements. -->
+ </metadata>
+ </prolog>
+
+ <conbody>
+
+ <p>
+ <indexterm audience="Cloudera">CREATE ROLE statement</indexterm>
+<!-- Copied from Sentry docs. Turn into conref. -->
+ The <codeph>CREATE ROLE</codeph> statement creates a role to which privileges can be granted. Privileges can
+ be granted to roles, which can then be assigned to users. A user that has been assigned a role will only be
+ able to exercise the privileges of that role. Only users that have administrative privileges can create/drop
+ roles. By default, the <codeph>hive</codeph>, <codeph>impala</codeph> and <codeph>hue</codeph> users have
+ administrative privileges in Sentry.
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>CREATE ROLE <varname>role_name</varname>
+</codeblock>
+
+ <p conref="../shared/impala_common.xml#common/privileges_blurb"/>
+
+ <p>
+ Only administrative users (those with <codeph>ALL</codeph> privileges on the server, defined in the Sentry
+ policy file) can use this statement.
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/compatibility_blurb"/>
+
+ <p>
+ Impala makes use of any roles and privileges specified by the <codeph>GRANT</codeph> and
+ <codeph>REVOKE</codeph> statements in Hive, and Hive makes use of any roles and privileges specified by the
+ <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements in Impala. The Impala <codeph>GRANT</codeph>
+ and <codeph>REVOKE</codeph> statements for privileges do not require the <codeph>ROLE</codeph> keyword to be
+ repeated before each role name, unlike the equivalent Hive statements.
+ </p>
+
+<!-- To do: nail down the new SHOW syntax, e.g. SHOW ROLES, SHOW CURRENT ROLES, SHOW GROUPS. -->
+
+ <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+
+ <p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
+
+ <p conref="../shared/impala_common.xml#common/related_info"/>
+
+ <p>
+ <xref href="impala_authorization.xml#authorization"/>, <xref href="impala_grant.xml#grant"/>,
+ <xref href="impala_revoke.xml#revoke"/>, <xref href="impala_drop_role.xml#drop_role"/>,
+ <xref href="impala_show.xml#show"/>
+ </p>
+ </conbody>
+</concept>
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_create_table.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_create_table.xml b/docs/topics/impala_create_table.xml
new file mode 100644
index 0000000..cdaee4a
--- /dev/null
+++ b/docs/topics/impala_create_table.xml
@@ -0,0 +1,650 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="create_table">
+
+ <title>CREATE TABLE Statement</title>
+ <titlealts><navtitle>CREATE TABLE</navtitle></titlealts>
+ <prolog>
+ <metadata>
+ <data name="Category" value="Impala"/>
+ <data name="Category" value="SQL"/>
+ <data name="Category" value="DDL"/>
+ <data name="Category" value="Impala Data Types"/>
+ <data name="Category" value="HDFS Caching"/>
+ <data name="Category" value="Tables"/>
+ <data name="Category" value="Schemas"/>
+ <data audience="impala_next" name="Category" value="Kudu"/>
+ </metadata>
+ </prolog>
+
+ <conbody>
+
+ <p>
+ <indexterm audience="Cloudera">CREATE TABLE statement</indexterm>
+ Creates a new table and specifies its characteristics. While creating a table, you optionally specify aspects
+ such as:
+ </p>
+
+ <ul>
+ <li>
+ Whether the table is internal or external.
+ </li>
+
+ <li>
+ The columns and associated data types.
+ </li>
+
+ <li>
+ The columns used for physically partitioning the data.
+ </li>
+
+ <li>
+ The file format for data files.
+ </li>
+
+ <li>
+ The HDFS directory where the data files are located.
+ </li>
+ </ul>
+
+ <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+ <p>
+ The general syntax for creating a table and specifying its columns is as follows:
+ </p>
+
+ <p>
+ <b>Explicit column definitions:</b>
+ </p>
+
+<codeblock>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [<varname>db_name</varname>.]<varname>table_name</varname>
+ [(<varname>col_name</varname> <varname>data_type</varname> [COMMENT '<varname>col_comment</varname>'], ...)]
+ [COMMENT '<varname>table_comment</varname>']
+ [PARTITIONED BY (<varname>col_name</varname> <varname>data_type</varname> [COMMENT '<varname>col_comment</varname>'], ...)]
+ [WITH SERDEPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
+ [
+ [ROW FORMAT <varname>row_format</varname>] [STORED AS <varname>file_format</varname>]
+ ]
+ [LOCATION '<varname>hdfs_path</varname>']
+ [TBLPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
+<ph rev="1.4.0"> [CACHED IN '<varname>pool_name</varname>'</ph> <ph rev="2.2.0">[WITH REPLICATION = <varname>integer</varname>]</ph> | UNCACHED]
+</codeblock>
+
+ <p>
+ <b>Column definitions inferred from data file:</b>
+ </p>
+
+<codeblock>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [<varname>db_name</varname>.]<varname>table_name</varname>
+ LIKE PARQUET '<varname>hdfs_path_of_parquet_file</varname>'
+ [COMMENT '<varname>table_comment</varname>']
+ [PARTITIONED BY (<varname>col_name</varname> <varname>data_type</varname> [COMMENT '<varname>col_comment</varname>'], ...)]
+ [WITH SERDEPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
+ [
+ [ROW FORMAT <varname>row_format</varname>] [STORED AS <varname>file_format</varname>]
+ ]
+ [LOCATION '<varname>hdfs_path</varname>']
+ [TBLPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
+<ph rev="1.4.0"> [CACHED IN '<varname>pool_name</varname>'</ph> <ph rev="2.2.0">[WITH REPLICATION = <varname>integer</varname>]</ph> | UNCACHED]
+data_type:
+ <varname>primitive_type</varname>
+ | array_type
+ | map_type
+ | struct_type
+</codeblock>
+
+ <p>
+ <b>CREATE TABLE AS SELECT:</b>
+ </p>
+
+<codeblock>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] <varname>db_name</varname>.]<varname>table_name</varname>
+ [COMMENT '<varname>table_comment</varname>']
+ [WITH SERDEPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
+ [
+ [ROW FORMAT <varname>row_format</varname>] [STORED AS <varname>file_format</varname>]
+ ]
+ [LOCATION '<varname>hdfs_path</varname>']
+ [TBLPROPERTIES ('<varname>key1</varname>'='<varname>value1</varname>', '<varname>key2</varname>'='<varname>value2</varname>', ...)]
+<ph rev="1.4.0"> [CACHED IN '<varname>pool_name</varname>'</ph> <ph rev="2.2.0">[WITH REPLICATION = <varname>integer</varname>]</ph> | UNCACHED]
+AS
+ <varname>select_statement</varname></codeblock>
+
+<codeblock>primitive_type:
+ TINYINT
+ | SMALLINT
+ | INT
+ | BIGINT
+ | BOOLEAN
+ | FLOAT
+ | DOUBLE
+ <ph rev="1.4.0">| DECIMAL</ph>
+ | STRING
+ <ph rev="2.0.0">| CHAR</ph>
+ <ph rev="2.0.0">| VARCHAR</ph>
+ | TIMESTAMP
+
+<ph rev="2.3.0">complex_type:
+ struct_type
+ | array_type
+ | map_type
+
+struct_type: STRUCT < <varname>name</varname> : <varname>primitive_or_complex_type</varname> [COMMENT '<varname>comment_string</varname>'], ... >
+
+array_type: ARRAY < <varname>primitive_or_complex_type</varname> >
+
+map_type: MAP < <varname>primitive_type</varname>, <varname>primitive_or_complex_type</varname> >
+</ph>
+row_format:
+ DELIMITED [FIELDS TERMINATED BY '<varname>char</varname>' [ESCAPED BY '<varname>char</varname>']]
+ [LINES TERMINATED BY '<varname>char</varname>']
+
+file_format:
+ PARQUET
+ | TEXTFILE
+ | AVRO
+ | SEQUENCEFILE
+ | RCFILE
+</codeblock>
+
+ <p conref="../shared/impala_common.xml#common/ddl_blurb"/>
+
+ <!-- Should really have some info up front about all the data types and file formats.
+ Consider adding here, or at least making inline links to the relevant keywords
+ in the syntax spec above. -->
+
+ <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+ <p rev="2.3.0">
+ The Impala complex types (<codeph>STRUCT</codeph>, <codeph>ARRAY</codeph>, or <codeph>MAP</codeph>)
+ are available in CDH 5.5 / Impala 2.3 and higher.
+ Because you can nest these types (for example, to make an array of maps or a struct
+ with an array field), these types are also sometimes referred to as nested types.
+ See <xref href="impala_complex_types.xml#complex_types"/> for usage details.
+ </p>
+
+ <!-- This is kind of an obscure and rare usage scenario. Consider moving all the complex type stuff further down
+ after some of the more common clauses. -->
+ <p rev="2.3.0">
+ Impala can create tables containing complex type columns, with any supported file format.
+ Because currently Impala can only query complex type columns in Parquet tables, creating
+ tables with complex type columns and other file formats such as text is of limited use.
+ For example, you might create a text table including some columns with complex types with Impala, and use Hive
+ as part of your to ingest the nested type data and copy it to an identical Parquet table.
+ Or you might create a partitioned table containing complex type columns using one file format, and
+ use <codeph>ALTER TABLE</codeph> to change the file format of individual partitions to Parquet; Impala
+ can then query only the Parquet-format partitions in that table.
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/complex_types_partitioning"/>
+
+ <p>
+ <b>Internal and external tables (EXTERNAL and LOCATION clauses):</b>
+ </p>
+
+ <p>
+ By default, Impala creates an <q>internal</q> table, where Impala manages the underlying data files for the
+ table, and physically deletes the data files when you drop the table. If you specify the
+ <codeph>EXTERNAL</codeph> clause, Impala treats the table as an <q>external</q> table, where the data files
+ are typically produced outside Impala and queried from their original locations in HDFS, and Impala leaves
+ the data files in place when you drop the table. For details about internal and external tables, see
+ <xref href="impala_tables.xml#tables"/>.
+ </p>
+
+ <p>
+ Typically, for an external table you include a <codeph>LOCATION</codeph> clause to specify the path to the
+ HDFS directory where Impala reads and writes files for the table. For example, if your data pipeline produces
+ Parquet files in the HDFS directory <filepath>/user/etl/destination</filepath>, you might create an external
+ table as follows:
+ </p>
+
+<codeblock>CREATE EXTERNAL TABLE external_parquet (c1 INT, c2 STRING, c3 TIMESTAMP)
+ STORED AS PARQUET LOCATION '/user/etl/destination';
+</codeblock>
+
+ <p>
+ Although the <codeph>EXTERNAL</codeph> and <codeph>LOCATION</codeph> clauses are often specified together,
+ <codeph>LOCATION</codeph> is optional for external tables, and you can also specify <codeph>LOCATION</codeph>
+ for internal tables. The difference is all about whether Impala <q>takes control</q> of the underlying data
+ files and moves them when you rename the table, or deletes them when you drop the table. For more about
+ internal and external tables and how they interact with the <codeph>LOCATION</codeph> attribute, see
+ <xref href="impala_tables.xml#tables"/>.
+ </p>
+
+ <p>
+ <b>Partitioned tables (PARTITIONED BY clause):</b>
+ </p>
+
+ <p>
+ The <codeph>PARTITIONED BY</codeph> clause divides the data files based on the values from one or more
+ specified columns. Impala queries can use the partition metadata to minimize the amount of data that is read
+ from disk or transmitted across the network, particularly during join queries. For details about
+ partitioning, see <xref href="impala_partitioning.xml#partitioning"/>.
+ </p>
+
+ <p rev="kudu" audience="impala_next">
+ <b>Partitioning for Kudu tables (DISTRIBUTE BY clause)</b>
+ </p>
+
+ <p rev="kudu" audience="impala_next">
+ For Kudu tables, you specify logical partitioning across one or more columns using the
+ <codeph>DISTRIBUTE BY</codeph> clause. In contrast to partitioning for HDFS-based tables,
+ multiple values for a partition key column can be located in the same partition.
+ The optional <codeph>HASH</codeph> clause lets you divide one or a set of partition key columns
+ into a specified number of buckets; you can use more than one <codeph>HASH</codeph>
+ clause, specifying a distinct set of partition key columns for each.
+ The optional <codeph>RANGE</codeph> clause further subdivides the partitions, based on
+ a set of literal values for the partition key columns.
+ </p>
+
+ <p>
+ <b>Specifying file format (STORED AS and ROW FORMAT clauses):</b>
+ </p>
+
+ <p>
+ The <codeph>STORED AS</codeph> clause identifies the format of the underlying data files. Currently, Impala
+ can query more types of file formats than it can create or insert into. Use Hive to perform any create or
+ data load operations that are not currently available in Impala. For example, Impala can create a
+ SequenceFile table but cannot insert data into it. There are also Impala-specific procedures for using
+ compression with each kind of file format. For details about working with data files of various formats, see
+ <xref href="impala_file_formats.xml#file_formats"/>.
+ </p>
+
+ <note>
+ In Impala 1.4.0 and higher, Impala can create Avro tables, which formerly required doing the <codeph>CREATE
+ TABLE</codeph> statement in Hive. See <xref href="impala_avro.xml#avro"/> for details and examples.
+ </note>
+
+ <p>
+ By default (when no <codeph>STORED AS</codeph> clause is specified), data files in Impala tables are created
+ as text files with Ctrl-A (hex 01) characters as the delimiter.
+<!-- Verify if ROW FORMAT is entirely ignored outside of text tables, or does it apply somehow to SequenceFile and/or RCFile too? -->
+ Specify the <codeph>ROW FORMAT DELIMITED</codeph> clause to produce or ingest data files that use a different
+ delimiter character such as tab or <codeph>|</codeph>, or a different line end character such as carriage
+ return or newline. When specifying delimiter and line end characters with the <codeph>FIELDS TERMINATED
+ BY</codeph> and <codeph>LINES TERMINATED BY</codeph> clauses, use <codeph>'\t'</codeph> for tab,
+ <codeph>'\n'</codeph> for newline or linefeed, <codeph>'\r'</codeph> for carriage return, and
+ <codeph>\</codeph><codeph>0</codeph> for ASCII <codeph>nul</codeph> (hex 00). For more examples of text
+ tables, see <xref href="impala_txtfile.xml#txtfile"/>.
+ </p>
+
+ <p>
+ The <codeph>ESCAPED BY</codeph> clause applies both to text files that you create through an
+ <codeph>INSERT</codeph> statement to an Impala <codeph>TEXTFILE</codeph> table, and to existing data files
+ that you put into an Impala table directory. (You can ingest existing data files either by creating the table
+ with <codeph>CREATE EXTERNAL TABLE ... LOCATION</codeph>, the <codeph>LOAD DATA</codeph> statement, or
+ through an HDFS operation such as <codeph>hdfs dfs -put <varname>file</varname>
+ <varname>hdfs_path</varname></codeph>.) Choose an escape character that is not used anywhere else in the
+ file, and put it in front of each instance of the delimiter character that occurs within a field value.
+ Surrounding field values with quotation marks does not help Impala to parse fields with embedded delimiter
+ characters; the quotation marks are considered to be part of the column value. If you want to use
+ <codeph>\</codeph> as the escape character, specify the clause in <cmdname>impala-shell</cmdname> as
+ <codeph>ESCAPED BY '\\'</codeph>.
+ </p>
+
+ <note conref="../shared/impala_common.xml#common/thorn"/>
+
+ <p>
+ <b>Cloning tables (LIKE clause):</b>
+ </p>
+
+ <p>
+ To create an empty table with the same columns, comments, and other attributes as another table, use the
+ following variation. The <codeph>CREATE TABLE ... LIKE</codeph> form allows a restricted set of clauses,
+ currently only the <codeph>LOCATION</codeph>, <codeph>COMMENT</codeph>, and <codeph>STORED AS</codeph>
+ clauses.
+ </p>
+
+<codeblock>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [<varname>db_name</varname>.]<varname>table_name</varname>
+ <ph rev="1.4.0">LIKE { [<varname>db_name</varname>.]<varname>table_name</varname> | PARQUET '<varname>hdfs_path_of_parquet_file</varname>' }</ph>
+ [COMMENT '<varname>table_comment</varname>']
+ [STORED AS <varname>file_format</varname>]
+ [LOCATION '<varname>hdfs_path</varname>']</codeblock>
+
+ <note rev="1.2">
+ To clone the structure of a table and transfer data into it in a single operation, use the <codeph>CREATE
+ TABLE AS SELECT</codeph> syntax described in the next subsection.
+ </note>
+
+ <p>
+ When you clone the structure of an existing table using the <codeph>CREATE TABLE ... LIKE</codeph> syntax,
+ the new table keeps the same file format as the original one, so you only need to specify the <codeph>STORED
+ AS</codeph> clause if you want to use a different file format, or when specifying a view as the original
+ table. (Creating a table <q>like</q> a view produces a text table by default.)
+ </p>
+
+ <p>
+ Although normally Impala cannot create an HBase table directly, Impala can clone the structure of an existing
+ HBase table with the <codeph>CREATE TABLE ... LIKE</codeph> syntax, preserving the file format and metadata
+ from the original table.
+ </p>
+
+ <p>
+ There are some exceptions to the ability to use <codeph>CREATE TABLE ... LIKE</codeph> with an Avro table.
+ For example, you cannot use this technique for an Avro table that is specified with an Avro schema but no
+ columns. When in doubt, check if a <codeph>CREATE TABLE ... LIKE</codeph> operation works in Hive; if not, it
+ typically will not work in Impala either.
+ </p>
+
+ <p>
+ If the original table is partitioned, the new table inherits the same partition key columns. Because the new
+ table is initially empty, it does not inherit the actual partitions that exist in the original one. To create
+ partitions in the new table, insert data or issue <codeph>ALTER TABLE ... ADD PARTITION</codeph> statements.
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/create_table_like_view"/>
+
+ <p>
+ Because <codeph>CREATE TABLE ... LIKE</codeph> only manipulates table metadata, not the physical data of the
+ table, issue <codeph>INSERT INTO TABLE</codeph> statements afterward to copy any data from the original table
+ into the new one, optionally converting the data to a new file format. (For some file formats, Impala can do
+ a <codeph>CREATE TABLE ... LIKE</codeph> to create the table, but Impala cannot insert data in that file
+ format; in these cases, you must load the data in Hive. See
+ <xref href="impala_file_formats.xml#file_formats"/> for details.)
+ </p>
+
+ <p rev="1.2" id="ctas">
+ <b>CREATE TABLE AS SELECT:</b>
+ </p>
+
+ <p>
+ The <codeph>CREATE TABLE AS SELECT</codeph> syntax is a shorthand notation to create a table based on column
+ definitions from another table, and copy data from the source table to the destination table without issuing
+ any separate <codeph>INSERT</codeph> statement. This idiom is so popular that it has its own acronym,
+ <q>CTAS</q>.
+<!--
+ The <codeph>CREATE TABLE AS SELECT</codeph> syntax is as follows:
+ -->
+ </p>
+
+<!-- CREATE TABLE AS <select> now incorporated up higher in the original syntax diagram,
+ thus commented out here.
+ Does CTAS only accept a limited subset of clauses? -->
+
+<!--
+<codeblock rev="1.2">CREATE [EXTERNAL] TABLE [IF NOT EXISTS] <varname>db_name</varname>.]<varname>table_name</varname>
+ [COMMENT '<varname>table_comment</varname>']
+ [STORED AS <varname>file_format</varname>]
+ [LOCATION '<varname>hdfs_path</varname>']
+AS
+ <varname>select_statement</varname></codeblock>
+-->
+
+ <p rev="1.2">
+ See <xref href="impala_select.xml#select"/> for details about query syntax for the <codeph>SELECT</codeph>
+ portion of a <codeph>CREATE TABLE AS SELECT</codeph> statement.
+ </p>
+
+ <p rev="1.2">
+ The newly created table inherits the column names that you select from the original table, which you can
+ override by specifying column aliases in the query. Any column or table comments from the original table are
+ not carried over to the new table.
+ </p>
+
+ <p rev="obwl" conref="../shared/impala_common.xml#common/insert_sort_blurb"/>
+
+ <p rev="1.2">
+ For example, the following statements show how you can clone all the data in a table, or a subset of the
+ columns and/or rows, or reorder columns, rename them, or construct them out of expressions:
+ </p>
+
+<codeblock rev="1.2">-- Create new table and copy all data.
+CREATE TABLE clone_of_t1 AS SELECT * FROM t1;
+-- Same idea as CREATE TABLE LIKE, don't copy any data.
+CREATE TABLE empty_clone_of_t1 AS SELECT * FROM t1 WHERE 1=0;
+-- Copy some data.
+CREATE TABLE subset_of_t1 AS SELECT * FROM t1 WHERE x > 100 AND y LIKE 'A%';
+CREATE TABLE summary_of_t1 AS SELECT c1, sum(c2) AS total, avg(c2) AS average FROM t1 GROUP BY c2;
+-- Switch file format.
+CREATE TABLE parquet_version_of_t1 STORED AS PARQUET AS SELECT * FROM t1;
+-- Create tables with different column order, names, or types than the original.
+CREATE TABLE some_columns_from_t1 AS SELECT c1, c3, c5 FROM t1;
+CREATE TABLE reordered_columns_from_t1 AS SELECT c4, c3, c1, c2 FROM t1;
+CREATE TABLE synthesized_columns AS SELECT upper(c1) AS all_caps, c2+c3 AS total, "California" AS state FROM t1;</codeblock>
+
+ <p rev="1.2">
+ As part of a CTAS operation, you can convert the data to any file format that Impala can write (currently,
+ <codeph>TEXTFILE</codeph> and <codeph>PARQUET</codeph>). You cannot specify the lower-level properties of a
+ text table, such as the delimiter. Although you can use a partitioned table as the source and copy data from
+ it, you cannot specify any partitioning clauses for the new table.
+ </p>
+
+ <p rev="1.4.0">
+ <b>CREATE TABLE LIKE PARQUET:</b>
+ </p>
+
+ <p rev="1.4.0">
+ The variation <codeph>CREATE TABLE ... LIKE PARQUET '<varname>hdfs_path_of_parquet_file</varname>'</codeph>
+ lets you skip the column definitions of the <codeph>CREATE TABLE</codeph> statement. The column names and
+ data types are automatically configured based on the organization of the specified Parquet data file, which
+ must already reside in HDFS. You can use a data file located outside the Impala database directories, or a
+ file from an existing Impala Parquet table; either way, Impala only uses the column definitions from the file
+ and does not use the HDFS location for the <codeph>LOCATION</codeph> attribute of the new table. (Although
+ you can also specify the enclosing directory with the <codeph>LOCATION</codeph> attribute, to both use the
+ same schema as the data file and point the Impala table at the associated directory for querying.)
+ </p>
+
+ <p rev="1.4.0">
+ The following considerations apply when you use the <codeph>CREATE TABLE LIKE PARQUET</codeph> technique:
+ </p>
+
+ <ul rev="1.4.0">
+ <li>
+ Any column comments from the original table are not preserved in the new table. Each column in the new
+ table has a comment stating the low-level Parquet field type used to deduce the appropriate SQL column
+ type.
+ </li>
+
+ <li>
+ If you use a data file from a partitioned Impala table, any partition key columns from the original table
+ are left out of the new table, because they are represented in HDFS directory names rather than stored in
+ the data file. To preserve the partition information, repeat the same <codeph>PARTITION</codeph> clause as
+ in the original <codeph>CREATE TABLE</codeph> statement.
+ </li>
+
+ <li>
+ The file format of the new table defaults to text, as with other kinds of <codeph>CREATE TABLE</codeph>
+ statements. To make the new table also use Parquet format, include the clause <codeph>STORED AS
+ PARQUET</codeph> in the <codeph>CREATE TABLE LIKE PARQUET</codeph> statement.
+ </li>
+
+ <li>
+ If the Parquet data file comes from an existing Impala table, currently, any <codeph>TINYINT</codeph> or
+ <codeph>SMALLINT</codeph> columns are turned into <codeph>INT</codeph> columns in the new table.
+ Internally, Parquet stores such values as 32-bit integers.
+ </li>
+
+ <li>
+ When the destination table uses the Parquet file format, the <codeph>CREATE TABLE AS SELECT</codeph> and
+ <codeph>INSERT ... SELECT</codeph> statements always create at least one data file, even if the
+ <codeph>SELECT</codeph> part of the statement does not match any rows. You can use such an empty Parquet
+ data file as a template for subsequent <codeph>CREATE TABLE LIKE PARQUET</codeph> statements.
+ </li>
+ </ul>
+
+ <p>
+ For more details about creating Parquet tables, and examples of the <codeph>CREATE TABLE LIKE
+ PARQUET</codeph> syntax, see <xref href="impala_parquet.xml#parquet"/>.
+ </p>
+
+ <p>
+ <b>Visibility and Metadata (TBLPROPERTIES and WITH SERDEPROPERTIES clauses):</b>
+ </p>
+
+ <p rev="1.2">
+ You can associate arbitrary items of metadata with a table by specifying the <codeph>TBLPROPERTIES</codeph>
+ clause. This clause takes a comma-separated list of key-value pairs and stores those items in the metastore
+ database. You can also change the table properties later with an <codeph>ALTER TABLE</codeph> statement. You
+ can observe the table properties for different delimiter and escape characters using the <codeph>DESCRIBE
+ FORMATTED</codeph> command, and change those settings for an existing table with <codeph>ALTER TABLE ... SET
+ TBLPROPERTIES</codeph>.
+ </p>
+
+ <p rev="1.2">
+ You can also associate SerDes properties with the table by specifying key-value pairs through the
+ <codeph>WITH SERDEPROPERTIES</codeph> clause. This metadata is not used by Impala, which has its own built-in
+ serializer and deserializer for the file formats it supports. Particular property values might be needed for
+ Hive compatibility with certain variations of file formats, particularly Avro.
+ </p>
+
+ <p>
+ Some DDL operations that interact with other Hadoop components require specifying particular values in the
+ <codeph>SERDEPROPERTIES</codeph> or <codeph>TBLPROPERTIES</codeph> fields, such as creating an Avro table or
+ an HBase table. (You typically create HBase tables in Hive, because they require additional clauses not
+ currently available in Impala.)
+<!-- Haven't got a working example from Lenni, so suppressing this recommendation for now.
+ The Avro schema properties can be specified through either
+ <codeph>TBLPROPERTIES</codeph> or <codeph>SERDEPROPERTIES</codeph>;
+ for best compatibility with future versions of Hive,
+ use <codeph>SERDEPROPERTIES</codeph> in this case.
+-->
+ </p>
+
+ <p>
+ To see the column definitions and column comments for an existing table, for example before issuing a
+ <codeph>CREATE TABLE ... LIKE</codeph> or a <codeph>CREATE TABLE ... AS SELECT</codeph> statement, issue the
+ statement <codeph>DESCRIBE <varname>table_name</varname></codeph>. To see even more detail, such as the
+ location of data files and the values for clauses such as <codeph>ROW FORMAT</codeph> and <codeph>STORED
+ AS</codeph>, issue the statement <codeph>DESCRIBE FORMATTED <varname>table_name</varname></codeph>.
+ <codeph>DESCRIBE FORMATTED</codeph> is also needed to see any overall table comment (as opposed to individual
+ column comments).
+ </p>
+
+ <p>
+ After creating a table, your <cmdname>impala-shell</cmdname> session or another
+ <cmdname>impala-shell</cmdname> connected to the same node can immediately query that table. There might be a
+ brief interval (one statestore heartbeat) before the table can be queried through a different Impala node. To
+ make the <codeph>CREATE TABLE</codeph> statement return only when the table is recognized by all Impala nodes
+ in the cluster, enable the <codeph>SYNC_DDL</codeph> query option.
+ </p>
+
+ <p rev="1.4.0">
+ <b>HDFS caching (CACHED IN clause):</b>
+ </p>
+
+ <p rev="1.4.0">
+ If you specify the <codeph>CACHED IN</codeph> clause, any existing or future data files in the table
+ directory or the partition subdirectories are designated to be loaded into memory with the HDFS caching
+ mechanism. See <xref href="impala_perf_hdfs_caching.xml#hdfs_caching"/> for details about using the HDFS
+ caching feature.
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/impala_cache_replication_factor"/>
+
+<!-- Say something in here about the SHOW statement, e.g. SHOW TABLES, SHOW TABLE/COLUMN STATS, SHOW PARTITIONS. -->
+
+ <p>
+ <b>Column order</b>:
+ </p>
+
+ <p>
+ If you intend to use the table to hold data files produced by some external source, specify the columns in
+ the same order as they appear in the data files.
+ </p>
+
+ <p>
+ If you intend to insert or copy data into the table through Impala, or if you have control over the way
+ externally produced data files are arranged, use your judgment to specify columns in the most convenient
+ order:
+ </p>
+
+ <ul>
+ <li>
+ <p>
+ If certain columns are often <codeph>NULL</codeph>, specify those columns last. You might produce data
+ files that omit these trailing columns entirely. Impala automatically fills in the <codeph>NULL</codeph>
+ values if so.
+ </p>
+ </li>
+
+ <li>
+ <p>
+ If an unpartitioned table will be used as the source for an <codeph>INSERT ... SELECT</codeph> operation
+ into a partitioned table, specify last in the unpartitioned table any columns that correspond to
+ partition key columns in the partitioned table, and in the same order as the partition key columns are
+ declared in the partitioned table. This technique lets you use <codeph>INSERT ... SELECT *</codeph> when
+ copying data to the partitioned table, rather than specifying each column name individually.
+ </p>
+ </li>
+
+ <li>
+ <p>
+ If you specify columns in an order that you later discover is suboptimal, you can sometimes work around
+ the problem without recreating the table. You can create a view that selects columns from the original
+ table in a permuted order, then do a <codeph>SELECT *</codeph> from the view. When inserting data into a
+ table, you can specify a permuted order for the inserted columns to match the order in the destination
+ table.
+ </p>
+ </li>
+ </ul>
+
+ <p conref="../shared/impala_common.xml#common/hive_blurb"/>
+
+ <p>
+ Impala queries can make use of metadata about the table and columns, such as the number of rows in a table or
+ the number of different values in a column. Prior to Impala 1.2.2, to create this metadata, you issued the
+ <codeph>ANALYZE TABLE</codeph> statement in Hive to gather this information, after creating the table and
+ loading representative data into it. In Impala 1.2.2 and higher, the <codeph>COMPUTE STATS</codeph> statement
+ produces these statistics within Impala, without needing to use Hive at all.
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/hbase_blurb"/>
+
+ <note>
+ <p>
+ The Impala <codeph>CREATE TABLE</codeph> statement cannot create an HBase table, because it currently does
+ not support the <codeph>STORED BY</codeph> clause needed for HBase tables. Create such tables in Hive, then
+ query them through Impala. For information on using Impala with HBase tables, see
+ <xref href="impala_hbase.xml#impala_hbase"/>.
+ </p>
+ </note>
+
+ <p conref="../shared/impala_common.xml#common/s3_blurb"/>
+ <p rev="2.2.0">
+ To create a table where the data resides in the Amazon Simple Storage Service (S3),
+ specify a <codeph>s3a://</codeph> prefix <codeph>LOCATION</codeph> attribute pointing to the data files in S3.
+ You can use this special <codeph>LOCATION</codeph> syntax when creating an empty table,
+ but not as part of a <codeph>CREATE TABLE AS SELECT</codeph> statement.
+ See <xref href="impala_s3.xml#s3"/> for details.
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/insert_sort_blurb"/>
+
+ <p conref="../shared/impala_common.xml#common/hdfs_blurb"/>
+
+ <p>
+ The <codeph>CREATE TABLE</codeph> statement for an internal table creates a directory in HDFS. The
+ <codeph>CREATE EXTERNAL TABLE</codeph> statement associates the table with an existing HDFS directory, and
+ does not create any new directory in HDFS. To locate the HDFS data directory for a table, issue a
+ <codeph>DESCRIBE FORMATTED <varname>table</varname></codeph> statement. To examine the contents of that HDFS
+ directory, use an OS command such as <codeph>hdfs dfs -ls hdfs://<varname>path</varname></codeph>, either
+ from the OS command line or through the <codeph>shell</codeph> or <codeph>!</codeph> commands in
+ <cmdname>impala-shell</cmdname>.
+ </p>
+
+ <p>
+ The <codeph>CREATE TABLE AS SELECT</codeph> syntax creates data files under the table data directory to hold
+ any data copied by the <codeph>INSERT</codeph> portion of the statement. (Even if no data is copied, Impala
+ might create one or more empty data files.)
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/permissions_blurb"/>
+ <p rev="CDH-19187">
+ <!-- TBD. -->
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/security_blurb"/>
+ <p conref="../shared/impala_common.xml#common/redaction_yes"/>
+
+ <p conref="../shared/impala_common.xml#common/cancel_blurb_maybe"/>
+
+ <p conref="../shared/impala_common.xml#common/related_info"/>
+
+ <p>
+ <xref href="impala_tables.xml#tables"/>,
+ <xref href="impala_alter_table.xml#alter_table"/>, <xref href="impala_drop_table.xml#drop_table"/>,
+ <xref href="impala_partitioning.xml#partitioning"/>, <xref href="impala_tables.xml#internal_tables"/>,
+ <xref href="impala_tables.xml#external_tables"/>, <xref href="impala_compute_stats.xml#compute_stats"/>,
+ <xref href="impala_sync_ddl.xml#sync_ddl"/>, <xref href="impala_show.xml#show_tables"/>,
+ <xref href="impala_show.xml#show_create_table"/>, <xref href="impala_describe.xml#describe"/>
+ </p>
+ </conbody>
+</concept>
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_create_view.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_create_view.xml b/docs/topics/impala_create_view.xml
new file mode 100644
index 0000000..2458279
--- /dev/null
+++ b/docs/topics/impala_create_view.xml
@@ -0,0 +1,136 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.1" id="create_view">
+
+ <title>CREATE VIEW Statement</title>
+ <titlealts><navtitle>CREATE VIEW</navtitle></titlealts>
+ <prolog>
+ <metadata>
+ <data name="Category" value="Impala"/>
+ <data name="Category" value="SQL"/>
+ <data name="Category" value="DDL"/>
+ <data name="Category" value="Tables"/>
+ <data name="Category" value="Schemas"/>
+ </metadata>
+ </prolog>
+
+ <conbody>
+
+ <p>
+ <indexterm audience="Cloudera">CREATE VIEW statement</indexterm>
+ The <codeph>CREATE VIEW</codeph> statement lets you create a shorthand abbreviation for a more complicated
+ query. The base query can involve joins, expressions, reordered columns, column aliases, and other SQL
+ features that can make a query hard to understand or maintain.
+ </p>
+
+ <p>
+ Because a view is purely a logical construct (an alias for a query) with no physical data behind it,
+ <codeph>ALTER VIEW</codeph> only involves changes to metadata in the metastore database, not any data files
+ in HDFS.
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>CREATE VIEW [IF NOT EXISTS] <varname>view_name</varname> [(<varname>column_list</varname>)]
+ AS <varname>select_statement</varname></codeblock>
+
+ <p conref="../shared/impala_common.xml#common/ddl_blurb"/>
+
+ <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+ <p>
+ The <codeph>CREATE VIEW</codeph> statement can be useful in scenarios such as the following:
+ </p>
+
+ <ul>
+ <li>
+ To turn even the most lengthy and complicated SQL query into a one-liner. You can issue simple queries
+ against the view from applications, scripts, or interactive queries in <cmdname>impala-shell</cmdname>.
+ For example:
+<codeblock>select * from <varname>view_name</varname>;
+select * from <varname>view_name</varname> order by c1 desc limit 10;</codeblock>
+ The more complicated and hard-to-read the original query, the more benefit there is to simplifying the
+ query using a view.
+ </li>
+
+ <li>
+ To hide the underlying table and column names, to minimize maintenance problems if those names change. In
+ that case, you re-create the view using the new names, and all queries that use the view rather than the
+ underlying tables keep running with no changes.
+ </li>
+
+ <li>
+ To experiment with optimization techniques and make the optimized queries available to all applications.
+ For example, if you find a combination of <codeph>WHERE</codeph> conditions, join order, join hints, and so
+ on that works the best for a class of queries, you can establish a view that incorporates the
+ best-performing techniques. Applications can then make relatively simple queries against the view, without
+ repeating the complicated and optimized logic over and over. If you later find a better way to optimize the
+ original query, when you re-create the view, all the applications immediately take advantage of the
+ optimized base query.
+ </li>
+
+ <li>
+ To simplify a whole class of related queries, especially complicated queries involving joins between
+ multiple tables, complicated expressions in the column list, and other SQL syntax that makes the query
+ difficult to understand and debug. For example, you might create a view that joins several tables, filters
+ using several <codeph>WHERE</codeph> conditions, and selects several columns from the result set.
+ Applications might issue queries against this view that only vary in their <codeph>LIMIT</codeph>,
+ <codeph>ORDER BY</codeph>, and similar simple clauses.
+ </li>
+ </ul>
+
+ <p>
+ For queries that require repeating complicated clauses over and over again, for example in the select list,
+ <codeph>ORDER BY</codeph>, and <codeph>GROUP BY</codeph> clauses, you can use the <codeph>WITH</codeph>
+ clause as an alternative to creating a view.
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+ <p conref="../shared/impala_common.xml#common/complex_types_views"/>
+ <p conref="../shared/impala_common.xml#common/complex_types_views_caveat"/>
+
+ <p conref="../shared/impala_common.xml#common/sync_ddl_blurb"/>
+
+ <p conref="../shared/impala_common.xml#common/security_blurb"/>
+ <p conref="../shared/impala_common.xml#common/redaction_yes"/>
+
+ <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+
+ <p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
+
+ <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+<!-- TK: Elaborate on these, show queries and real output. -->
+
+<codeblock>-- Create a view that is exactly the same as the underlying table.
+create view v1 as select * from t1;
+
+-- Create a view that includes only certain columns from the underlying table.
+create view v2 as select c1, c3, c7 from t1;
+
+-- Create a view that filters the values from the underlying table.
+create view v3 as select distinct c1, c3, c7 from t1 where c1 is not null and c5 > 0;
+
+-- Create a view that that reorders and renames columns from the underlying table.
+create view v4 as select c4 as last_name, c6 as address, c2 as birth_date from t1;
+
+-- Create a view that runs functions to convert or transform certain columns.
+create view v5 as select c1, cast(c3 as string) c3, concat(c4,c5) c5, trim(c6) c6, "Constant" c8 from t1;
+
+-- Create a view that hides the complexity of a view query.
+create view v6 as select t1.c1, t2.c2 from t1 join t2 on t1.id = t2.id;
+</codeblock>
+
+<!-- These examples show CREATE VIEW and corresponding DROP VIEW statements, with different combinations
+ of qualified and unqualified names. -->
+
+ <p conref="../shared/impala_common.xml#common/create_drop_view_examples"/>
+
+ <p conref="../shared/impala_common.xml#common/related_info"/>
+
+ <p>
+ <xref href="impala_views.xml#views"/>, <xref href="impala_alter_view.xml#alter_view"/>,
+ <xref href="impala_drop_view.xml#drop_view"/>
+ </p>
+ </conbody>
+</concept>
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_databases.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_databases.xml b/docs/topics/impala_databases.xml
new file mode 100644
index 0000000..ad0511f
--- /dev/null
+++ b/docs/topics/impala_databases.xml
@@ -0,0 +1,65 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="databases">
+
+ <title>Overview of Impala Databases</title>
+ <titlealts><navtitle>Databases</navtitle></titlealts>
+ <prolog>
+ <metadata>
+ <data name="Category" value="Impala"/>
+ <data name="Category" value="Databases"/>
+ <data name="Category" value="SQL"/>
+ <data name="Category" value="Data Analysts"/>
+ <data name="Category" value="Developers"/>
+ <data name="Category" value="Querying"/>
+ <data name="Category" value="Schemas"/>
+ </metadata>
+ </prolog>
+
+ <conbody>
+
+ <p>
+ In Impala, a database is a logical container for a group of tables. Each database defines a separate
+ namespace. Within a database, you can refer to the tables inside it using their unqualified names. Different
+ databases can contain tables with identical names.
+ </p>
+
+ <p>
+ Creating a database is a lightweight operation. There are minimal database-specific properties to configure,
+ only <codeph>LOCATION</codeph> and <codeph>COMMENT</codeph>. There is no <codeph>ALTER DATABASE</codeph> statement.
+ </p>
+
+ <p>
+ Typically, you create a separate database for each project or application, to avoid naming conflicts between
+ tables and to make clear which tables are related to each other. The <codeph>USE</codeph> statement lets
+ you switch between databases. Unqualified references to tables, views, and functions refer to objects
+ within the current database. You can also refer to objects in other databases by using qualified names
+ of the form <codeph><varname>dbname</varname>.<varname>object_name</varname></codeph>.
+ </p>
+
+ <p>
+ Each database is physically represented by a directory in HDFS. When you do not specify a <codeph>LOCATION</codeph>
+ attribute, the directory is located in the Impala data directory with the associated tables managed by Impala.
+ When you do specify a <codeph>LOCATION</codeph> attribute, any read and write operations for tables in that
+ database are relative to the specified HDFS directory.
+ </p>
+
+ <p>
+ There is a special database, named <codeph>default</codeph>, where you begin when you connect to Impala.
+ Tables created in <codeph>default</codeph> are physically located one level higher in HDFS than all the
+ user-created databases.
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/builtins_db"/>
+
+ <p>
+ <b>Related statements:</b>
+ </p>
+
+ <p>
+ <xref href="impala_create_database.xml#create_database"/>,
+ <xref href="impala_drop_database.xml#drop_database"/>, <xref href="impala_use.xml#use"/>,
+ <xref href="impala_show.xml#show_databases"/>
+ </p>
+ </conbody>
+</concept>
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_datatypes.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_datatypes.xml b/docs/topics/impala_datatypes.xml
new file mode 100644
index 0000000..e45867e
--- /dev/null
+++ b/docs/topics/impala_datatypes.xml
@@ -0,0 +1,43 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="datatypes">
+
+ <title>Data Types</title>
+ <prolog>
+ <metadata>
+ <data name="Category" value="Impala"/>
+ <data name="Category" value="Impala Data Types"/>
+ <data name="Category" value="SQL"/>
+ <data name="Category" value="Data Analysts"/>
+ <data name="Category" value="Developers"/>
+ <data name="Category" value="SQL"/>
+ <data name="Category" value="Schemas"/>
+ </metadata>
+ </prolog>
+
+ <conbody>
+
+ <p>
+ <indexterm audience="Cloudera">data types</indexterm>
+ Impala supports a set of data types that you can use for table columns, expression values, and function
+ arguments and return values.
+ </p>
+
+ <note>
+ Currently, Impala supports only scalar types, not composite or nested types. Accessing a table containing any
+ columns with unsupported types causes an error.
+ </note>
+
+ <p outputclass="toc"/>
+
+ <p>
+ For the notation to write literals of each of these data types, see
+ <xref href="impala_literals.xml#literals"/>.
+ </p>
+
+ <p>
+ See <xref href="impala_langref_unsupported.xml#langref_hiveql_delta"/> for differences between Impala and
+ Hive data types.
+ </p>
+ </conbody>
+</concept>