You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by jr...@apache.org on 2016/07/26 23:04:58 UTC
[06/22] incubator-impala git commit: First try at porting over the source files necessary for the Impala SQL Reference.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_porting.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_porting.xml b/docs/topics/impala_porting.xml
new file mode 100644
index 0000000..c9c8e52
--- /dev/null
+++ b/docs/topics/impala_porting.xml
@@ -0,0 +1,622 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="porting">
+
+  <title>Porting SQL from Other Database Systems to Impala</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Databases"/>
+      <data name="Category" value="Hive"/>
+      <data name="Category" value="Oracle"/>
+      <data name="Category" value="MySQL"/>
+      <data name="Category" value="PostgreSQL"/>
+      <data name="Category" value="Troubleshooting"/>
+      <data name="Category" value="Porting"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">porting</indexterm>
+      Although Impala uses standard SQL for queries, you might need to modify SQL source when bringing applications
+      to Impala, due to variations in data types, built-in functions, vendor language extensions, and
+      Hadoop-specific syntax. Even when SQL is working correctly, you might make further minor modifications for
+      best performance.
+    </p>
+
+    <p outputclass="toc inpage"/>
+  </conbody>
+
+  <concept id="porting_ddl_dml">
+
+    <title>Porting DDL and DML Statements</title>
+
+    <conbody>
+
+      <p>
+        When adapting SQL code from a traditional database system to Impala, expect to find a number of differences
+        in the DDL statements that you use to set up the schema. Clauses related to physical layout of files,
+        tablespaces, and indexes have no equivalent in Impala. You might restructure your schema considerably to
+        account for the Impala partitioning scheme and Hadoop file formats.
+      </p>
+
+      <p>
+        Expect SQL queries to have a much higher degree of compatibility. With modest rewriting to address vendor
+        extensions and features not yet supported in Impala, you might be able to run identical or almost-identical
+        query text on both systems.
+      </p>
+
+      <p>
+        Therefore, consider separating out the DDL into a separate Impala-specific setup script. Focus your reuse
+        and ongoing tuning efforts on the code for SQL queries.
+      </p>
+    </conbody>
+  </concept>
+
+  <concept id="porting_data_types">
+
+    <title>Porting Data Types from Other Database Systems</title>
+
+    <conbody>
+
+      <ul>
+        <li>
+          <p>
+            Change any <codeph>VARCHAR</codeph>, <codeph>VARCHAR2</codeph>, and <codeph>CHAR</codeph> columns to
+            <codeph>STRING</codeph>. Remove any length constraints from the column declarations; for example,
+            change <codeph>VARCHAR(32)</codeph> or <codeph>CHAR(1)</codeph> to <codeph>STRING</codeph>. Impala is
+            very flexible about the length of string values; it does not impose any length constraints
+            or do any special processing (such as blank-padding) for <codeph>STRING</codeph> columns.
+            (In Impala 2.0 and higher, there are data types <codeph>VARCHAR</codeph> and <codeph>CHAR</codeph>,
+            with length constraints for both types and blank-padding for <codeph>CHAR</codeph>.
+            However, for performance reasons, it is still preferable to use <codeph>STRING</codeph>
+            columns where practical.)
+          </p>
+        </li>
+
+        <li>
+          <p>
+            For national language character types such as <codeph>NCHAR</codeph>, <codeph>NVARCHAR</codeph>, or
+            <codeph>NCLOB</codeph>, be aware that while Impala can store and query UTF-8 character data, currently
+            some string manipulation operations only work correctly with ASCII data. See
+            <xref href="impala_string.xml#string"/> for details.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Change any <codeph>DATE</codeph>, <codeph>DATETIME</codeph>, or <codeph>TIME</codeph> columns to
+            <codeph>TIMESTAMP</codeph>. Remove any precision constraints. Remove any timezone clauses, and make
+            sure your application logic or ETL process accounts for the fact that Impala expects all
+            <codeph>TIMESTAMP</codeph> values to be in
+            <xref href="http://en.wikipedia.org/wiki/Coordinated_Universal_Time" scope="external" format="html">Coordinated
+            Universal Time (UTC)</xref>. See <xref href="impala_timestamp.xml#timestamp"/> for information about
+            the <codeph>TIMESTAMP</codeph> data type, and
+            <xref href="impala_datetime_functions.xml#datetime_functions"/> for conversion functions for different
+            date and time formats.
+          </p>
+          <p>
+            You might also need to adapt date- and time-related literal values and format strings to use the
+            supported Impala date and time formats. If you have date and time literals with different separators or
+            different numbers of <codeph>YY</codeph>, <codeph>MM</codeph>, and so on placeholders than Impala
+            expects, consider using calls to <codeph>regexp_replace()</codeph> to transform those values to the
+            Impala-compatible format. See <xref href="impala_timestamp.xml#timestamp"/> for information about the
+            allowed formats for date and time literals, and
+            <xref href="impala_string_functions.xml#string_functions"/> for string conversion functions such as
+            <codeph>regexp_replace()</codeph>.
+          </p>
+          <p>
+            Instead of <codeph>SYSDATE</codeph>, call the function <codeph>NOW()</codeph>.
+          </p>
+          <p>
+            Instead of adding or subtracting directly from a date value to produce a value <varname>N</varname>
+            days in the past or future, use an <codeph>INTERVAL</codeph> expression, for example <codeph>NOW() +
+            INTERVAL 30 DAYS</codeph>.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Although Impala supports <codeph>INTERVAL</codeph> expressions for datetime arithmetic, as shown in
+            <xref href="impala_timestamp.xml#timestamp"/>, <codeph>INTERVAL</codeph> is not available as a column
+            data type in Impala. For any <codeph>INTERVAL</codeph> values stored in tables, convert them to numeric
+            values that you can add or subtract using the functions in
+            <xref href="impala_datetime_functions.xml#datetime_functions"/>. For example, if you had a table
+            <codeph>DEADLINES</codeph> with an <codeph>INT</codeph> column <codeph>TIME_PERIOD</codeph>, you could
+            construct dates N days in the future like so:
+          </p>
+<codeblock>SELECT NOW() + INTERVAL time_period DAYS from deadlines;</codeblock>
+        </li>
+
+        <li>
+          <p>
+            For <codeph>YEAR</codeph> columns, change to the smallest Impala integer type that has sufficient
+            range. See <xref href="impala_datatypes.xml#datatypes"/> for details about ranges, casting, and so on
+            for the various numeric data types.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Change any <codeph>DECIMAL</codeph> and <codeph>NUMBER</codeph> types. If fixed-point precision is not
+            required, you can use <codeph>FLOAT</codeph> or <codeph>DOUBLE</codeph> on the Impala side depending on
+            the range of values. For applications that require precise decimal values, such as financial data, you
+            might need to make more extensive changes to table structure and application logic, such as using
+            separate integer columns for dollars and cents, or encoding numbers as string values and writing UDFs
+            to manipulate them. See <xref href="impala_datatypes.xml#datatypes"/> for details about ranges,
+            casting, and so on for the various numeric data types.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            <codeph>FLOAT</codeph>, <codeph>DOUBLE</codeph>, and <codeph>REAL</codeph> types are supported in
+            Impala. Remove any precision and scale specifications. (In Impala, <codeph>REAL</codeph> is just an
+            alias for <codeph>DOUBLE</codeph>; columns declared as <codeph>REAL</codeph> are turned into
+            <codeph>DOUBLE</codeph> behind the scenes.) See <xref href="impala_datatypes.xml#datatypes"/> for
+            details about ranges, casting, and so on for the various numeric data types.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Most integer types from other systems have equivalents in Impala, perhaps under different names such as
+            <codeph>BIGINT</codeph> instead of <codeph>INT8</codeph>. For any that are unavailable, for example
+            <codeph>MEDIUMINT</codeph>, switch to the smallest Impala integer type that has sufficient range.
+            Remove any precision specifications. See <xref href="impala_datatypes.xml#datatypes"/> for details
+            about ranges, casting, and so on for the various numeric data types.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Remove any <codeph>UNSIGNED</codeph> constraints. All Impala numeric types are signed. See
+            <xref href="impala_datatypes.xml#datatypes"/> for details about ranges, casting, and so on for the
+            various numeric data types.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            For any types holding bitwise values, use an integer type with enough range to hold all the relevant
+            bits within a positive integer. See <xref href="impala_datatypes.xml#datatypes"/> for details about
+            ranges, casting, and so on for the various numeric data types.
+          </p>
+          <p>
+            For example, <codeph>TINYINT</codeph> has a maximum positive value of 127, not 256, so to manipulate
+            8-bit bitfields as positive numbers switch to the next largest type <codeph>SMALLINT</codeph>.
+          </p>
+<codeblock>[localhost:21000] &gt; select cast(127*2 as tinyint);
++--------------------------+
+| cast(127 * 2 as tinyint) |
++--------------------------+
+| -2                       |
++--------------------------+
+[localhost:21000] &gt; select cast(128 as tinyint);
++----------------------+
+| cast(128 as tinyint) |
++----------------------+
+| -128                 |
++----------------------+
+[localhost:21000] &gt; select cast(127*2 as smallint);
++---------------------------+
+| cast(127 * 2 as smallint) |
++---------------------------+
+| 254                       |
++---------------------------+</codeblock>
+          <p>
+            Impala does not support notation such as <codeph>b'0101'</codeph> for bit literals.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            For BLOB values, use <codeph>STRING</codeph> to represent <codeph>CLOB</codeph> or
+            <codeph>TEXT</codeph> types (character based large objects) up to 32 KB in size. Binary large objects
+            such as <codeph>BLOB</codeph>, <codeph>RAW</codeph> <codeph>BINARY</codeph>, and
+            <codeph>VARBINARY</codeph> do not currently have an equivalent in Impala.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            For Boolean-like types such as <codeph>BOOL</codeph>, use the Impala <codeph>BOOLEAN</codeph> type.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Because Impala currently does not support composite or nested types, any spatial data types in other
+            database systems do not have direct equivalents in Impala. You could represent spatial values in string
+            format and write UDFs to process them. See <xref href="impala_udf.xml#udfs"/> for details. Where
+            practical, separate spatial types into separate tables so that Impala can still work with the
+            non-spatial data.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Take out any <codeph>DEFAULT</codeph> clauses. Impala can use data files produced from many different
+            sources, such as Pig, Hive, or MapReduce jobs. The fast import mechanisms of <codeph>LOAD DATA</codeph>
+            and external tables mean that Impala is flexible about the format of data files, and Impala does not
+            necessarily validate or cleanse data before querying it. When copying data through Impala
+            <codeph>INSERT</codeph> statements, you can use conditional functions such as <codeph>CASE</codeph> or
+            <codeph>NVL</codeph> to substitute some other value for <codeph>NULL</codeph> fields; see
+            <xref href="impala_conditional_functions.xml#conditional_functions"/> for details.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Take out any constraints from your <codeph>CREATE TABLE</codeph> and <codeph>ALTER TABLE</codeph>
+            statements, for example <codeph>PRIMARY KEY</codeph>, <codeph>FOREIGN KEY</codeph>,
+            <codeph>UNIQUE</codeph>, <codeph>NOT NULL</codeph>, <codeph>UNSIGNED</codeph>, or
+            <codeph>CHECK</codeph> constraints. Impala can use data files produced from many different sources,
+            such as Pig, Hive, or MapReduce jobs. Therefore, Impala expects initial data validation to happen
+            earlier during the ETL or ELT cycle. After data is loaded into Impala tables, you can perform queries
+            to test for <codeph>NULL</codeph> values. When copying data through Impala <codeph>INSERT</codeph>
+            statements, you can use conditional functions such as <codeph>CASE</codeph> or <codeph>NVL</codeph> to
+            substitute some other value for <codeph>NULL</codeph> fields; see
+            <xref href="impala_conditional_functions.xml#conditional_functions"/> for details.
+          </p>
+          <p>
+            Do as much verification as practical before loading data into Impala. After data is loaded into Impala,
+            you can do further verification using SQL queries to check if values have expected ranges, if values
+            are <codeph>NULL</codeph> or not, and so on. If there is a problem with the data, you will need to
+            re-run earlier stages of the ETL process, or do an <codeph>INSERT ... SELECT</codeph> statement in
+            Impala to copy the faulty data to a new table and transform or filter out the bad values.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Take out any <codeph>CREATE INDEX</codeph>, <codeph>DROP INDEX</codeph>, and <codeph>ALTER
+            INDEX</codeph> statements, and equivalent <codeph>ALTER TABLE</codeph> statements. Remove any
+            <codeph>INDEX</codeph>, <codeph>KEY</codeph>, or <codeph>PRIMARY KEY</codeph> clauses from
+            <codeph>CREATE TABLE</codeph> and <codeph>ALTER TABLE</codeph> statements. Impala is optimized for bulk
+            read operations for data warehouse-style queries, and therefore does not support indexes for its
+            tables.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Calls to built-in functions with out-of-range or otherwise incorrect arguments, return
+            <codeph>NULL</codeph> in Impala as opposed to raising exceptions. (This rule applies even when the
+            <codeph>ABORT_ON_ERROR=true</codeph> query option is in effect.) Run small-scale queries using
+            representative data to doublecheck that calls to built-in functions are returning expected values
+            rather than <codeph>NULL</codeph>. For example, unsupported <codeph>CAST</codeph> operations do not
+            raise an error in Impala:
+          </p>
+<codeblock>select cast('foo' as int);
++--------------------+
+| cast('foo' as int) |
++--------------------+
+| NULL               |
++--------------------+</codeblock>
+        </li>
+
+        <li>
+          <p>
+            For any other type not supported in Impala, you could represent their values in string format and write
+            UDFs to process them. See <xref href="impala_udf.xml#udfs"/> for details.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            To detect the presence of unsupported or unconvertable data types in data files, do initial testing
+            with the <codeph>ABORT_ON_ERROR=true</codeph> query option in effect. This option causes queries to
+            fail immediately if they encounter disallowed type conversions. See
+            <xref href="impala_abort_on_error.xml#abort_on_error"/> for details. For example:
+          </p>
+<codeblock>set abort_on_error=true;
+select count(*) from (select * from t1);
+-- The above query will fail if the data files for T1 contain any
+-- values that can't be converted to the expected Impala data types.
+-- For example, if T1.C1 is defined as INT but the column contains
+-- floating-point values like 1.1, the query will return an error.</codeblock>
+        </li>
+      </ul>
+    </conbody>
+  </concept>
+
+  <concept id="porting_statements">
+
+    <title>SQL Statements to Remove or Adapt</title>
+
+    <conbody>
+
+      <p>
+        Some SQL statements or clauses that you might be familiar with are not currently supported in Impala:
+      </p>
+
+      <ul>
+        <li>
+          <p>
+            Impala has no <codeph>DELETE</codeph> statement. Impala is intended for data warehouse-style operations
+            where you do bulk moves and transforms of large quantities of data. Instead of using
+            <codeph>DELETE</codeph>, use <codeph>INSERT OVERWRITE</codeph> to entirely replace the contents of a
+            table or partition, or use <codeph>INSERT ... SELECT</codeph> to copy a subset of data (everything but
+            the rows you intended to delete) from one table to another. See <xref href="impala_dml.xml#dml"/> for
+            an overview of Impala DML statements.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Impala has no <codeph>UPDATE</codeph> statement. Impala is intended for data warehouse-style operations
+            where you do bulk moves and transforms of large quantities of data. Instead of using
+            <codeph>UPDATE</codeph>, do all necessary transformations early in the ETL process, such as in the job
+            that generates the original data, or when copying from one table to another to convert to a particular
+            file format or partitioning scheme. See <xref href="impala_dml.xml#dml"/> for an overview of Impala DML
+            statements.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Impala has no transactional statements, such as <codeph>COMMIT</codeph> or <codeph>ROLLBACK</codeph>.
+            Impala effectively works like the <codeph>AUTOCOMMIT</codeph> mode in some database systems, where
+            changes take effect as soon as they are made.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            If your database, table, column, or other names conflict with Impala reserved words, use different
+            names or quote the names with backticks. See <xref href="impala_reserved_words.xml#reserved_words"/>
+            for the current list of Impala reserved words.
+          </p>
+          <p>
+            Conversely, if you use a keyword that Impala does not recognize, it might be interpreted as a table or
+            column alias. For example, in <codeph>SELECT * FROM t1 NATURAL JOIN t2</codeph>, Impala does not
+            recognize the <codeph>NATURAL</codeph> keyword and interprets it as an alias for the table
+            <codeph>t1</codeph>. If you experience any unexpected behavior with queries, check the list of reserved
+            words to make sure all keywords in join and <codeph>WHERE</codeph> clauses are recognized.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Impala supports subqueries only in the <codeph>FROM</codeph> clause of a query, not within the
+            <codeph>WHERE</codeph> clauses. Therefore, you cannot use clauses such as <codeph>WHERE
+            <varname>column</varname> IN (<varname>subquery</varname>)</codeph>. Also, Impala does not allow
+            <codeph>EXISTS</codeph> or <codeph>NOT EXISTS</codeph> clauses (although <codeph>EXISTS</codeph> is a
+            reserved keyword).
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Impala supports <codeph>UNION</codeph> and <codeph>UNION ALL</codeph> set operators, but not
+            <codeph>INTERSECT</codeph>. <ph conref="../shared/impala_common.xml#common/union_all_vs_union"/>
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Within queries, Impala requires query aliases for any subqueries:
+          </p>
+<codeblock>-- Without the alias 'contents_of_t1' at the end, query gives syntax error.
+select count(*) from (select * from t1) contents_of_t1;</codeblock>
+        </li>
+
+        <li>
+          <p>
+            When an alias is declared for an expression in a query, that alias cannot be referenced again within
+            the same query block:
+          </p>
+<codeblock>-- Can't reference AVERAGE twice in the SELECT list where it's defined.
+select avg(x) as average, average+1 from t1 group by x;
+ERROR: AnalysisException: couldn't resolve column reference: 'average'
+
+-- Although it can be referenced again later in the same query.
+select avg(x) as average from t1 group by x having average &gt; 3;</codeblock>
+          <p>
+            For Impala, either repeat the expression again, or abstract the expression into a <codeph>WITH</codeph>
+            clause, creating named columns that can be referenced multiple times anywhere in the base query:
+          </p>
+<codeblock>-- The following 2 query forms are equivalent.
+select avg(x) as average, avg(x)+1 from t1 group by x;
+with avg_t as (select avg(x) average from t1 group by x) select average, average+1 from avg_t;</codeblock>
+<!-- An alternative bunch of queries to use in the example above.
+[localhost:21000] > select x*x as x_squared from t1;
+
+[localhost:21000] > select x*x as x_squared from t1 where x_squared < 100;
+ERROR: AnalysisException: couldn't resolve column reference: 'x_squared'
+[localhost:21000] > select x*x as x_squared, x_squared * pi() as pi_x_squared from t1;
+ERROR: AnalysisException: couldn't resolve column reference: 'x_squared'
+[localhost:21000] > select x*x as x_squared from t1 group by x_squared;
+
+[localhost:21000] > select x*x as x_squared from t1 group by x_squared having x_squared < 100;
+-->
+        </li>
+
+        <li>
+          <p>
+            Impala does not support certain rarely used join types that are less appropriate for high-volume tables
+            used for data warehousing. In some cases, Impala supports join types but requires explicit syntax to
+            ensure you do not do inefficient joins of huge tables by accident. For example, Impala does not support
+            natural joins or anti-joins, and requires the <codeph>CROSS JOIN</codeph> operator for Cartesian
+            products. See <xref href="impala_joins.xml#joins"/> for details on the syntax for Impala join clauses.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            Impala has a limited choice of partitioning types. Partitions are defined based on each distinct
+            combination of values for one or more partition key columns. Impala does not redistribute or check data
+            to create evenly distributed partitions; you must choose partition key columns based on your knowledge
+            of the data volume and distribution. Adapt any tables that use range, list, hash, or key partitioning
+            to use the Impala partition syntax for <codeph>CREATE TABLE</codeph> and <codeph>ALTER TABLE</codeph>
+            statements. Impala partitioning is similar to range partitioning where every range has exactly one
+            value, or key partitioning where the hash function produces a separate bucket for every combination of
+            key values. See <xref href="impala_partitioning.xml#partitioning"/> for usage details, and
+            <xref href="impala_create_table.xml#create_table"/> and
+            <xref href="impala_alter_table.xml#alter_table"/> for syntax.
+          </p>
+          <note>
+            Because the number of separate partitions is potentially higher than in other database systems, keep a
+            close eye on the number of partitions and the volume of data in each one; scale back the number of
+            partition key columns if you end up with too many partitions with a small volume of data in each one.
+            Remember, to distribute work for a query across a cluster, you need at least one HDFS block per node.
+            HDFS blocks are typically multiple megabytes, <ph rev="parquet_block_size">especially</ph> for Parquet
+            files. Therefore, if each partition holds only a few megabytes of data, you are unlikely to see much
+            parallelism in the query because such a small amount of data is typically processed by a single node.
+          </note>
+        </li>
+
+        <li>
+          <p>
+            For <q>top-N</q> queries, Impala uses the <codeph>LIMIT</codeph> clause rather than comparing against a
+            pseudocolumn named <codeph>ROWNUM</codeph> or <codeph>ROW_NUM</codeph>. See
+            <xref href="impala_limit.xml#limit"/> for details.
+          </p>
+        </li>
+      </ul>
+    </conbody>
+  </concept>
+
+  <concept id="porting_antipatterns">
+
+    <title>SQL Constructs to Doublecheck</title>
+
+    <conbody>
+
+      <p>
+        Some SQL constructs that are supported have behavior or defaults more oriented towards convenience than
+        optimal performance. Also, sometimes machine-generated SQL, perhaps issued through JDBC or ODBC
+        applications, might have inefficiencies or exceed internal Impala limits. As you port SQL code, be alert
+        and change these things where appropriate:
+      </p>
+
+      <ul>
+        <li>
+          <p>
+            A <codeph>CREATE TABLE</codeph> statement with no <codeph>STORED AS</codeph> clause creates data files
+            in plain text format, which is convenient for data interchange but not a good choice for high-volume
+            data with high-performance queries. See <xref href="impala_file_formats.xml#file_formats"/> for why and
+            how to use specific file formats for compact data and high-performance queries. Especially see
+            <xref href="impala_parquet.xml#parquet"/>, for details about the file format most heavily optimized for
+            large-scale data warehouse queries.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            A <codeph>CREATE TABLE</codeph> statement with no <codeph>PARTITIONED BY</codeph> clause stores all the
+            data files in the same physical location, which can lead to scalability problems when the data volume
+            becomes large.
+          </p>
+          <p>
+            On the other hand, adapting tables that were already partitioned in a different database system could
+            produce an Impala table with a high number of partitions and not enough data in each one, leading to
+            underutilization of Impala's parallel query features.
+          </p>
+          <p>
+            See <xref href="impala_partitioning.xml#partitioning"/> for details about setting up partitioning and
+            tuning the performance of queries on partitioned tables.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            The <codeph>INSERT ... VALUES</codeph> syntax is suitable for setting up toy tables with a few rows for
+            functional testing, but because each such statement creates a separate tiny file in HDFS, it is not a
+            scalable technique for loading megabytes or gigabytes (let alone petabytes) of data. Consider revising
+            your data load process to produce raw data files outside of Impala, then setting up Impala external
+            tables or using the <codeph>LOAD DATA</codeph> statement to use those data files instantly in Impala
+            tables, with no conversion or indexing stage. See <xref href="impala_tables.xml#external_tables"/> and
+            <xref href="impala_load_data.xml#load_data"/> for details about the Impala techniques for working with
+            data files produced outside of Impala; see <xref href="impala_tutorial.xml#tutorial_etl"/> for examples
+            of ETL workflow for Impala.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            If your ETL process is not optimized for Hadoop, you might end up with highly fragmented small data
+            files, or a single giant data file that cannot take advantage of distributed parallel queries or
+            partitioning. In this case, use an <codeph>INSERT ... SELECT</codeph> statement to copy the data into a
+            new table and reorganize into a more efficient layout in the same operation. See
+            <xref href="impala_insert.xml#insert"/> for details about the <codeph>INSERT</codeph> statement.
+          </p>
+          <p>
+            You can do <codeph>INSERT ... SELECT</codeph> into a table with a more efficient file format (see
+            <xref href="impala_file_formats.xml#file_formats"/>) or from an unpartitioned table into a partitioned
+            one (see <xref href="impala_partitioning.xml#partitioning"/>).
+          </p>
+        </li>
+
+        <li>
+          <p>
+            The number of expressions allowed in an Impala query might be smaller than for some other database
+            systems, causing failures for very complicated queries (typically produced by automated SQL
+            generators). Where practical, keep the number of expressions in the <codeph>WHERE</codeph> clauses to
+            approximately 2000 or fewer. As a workaround, set the query option
+            <codeph>DISABLE_CODEGEN=true</codeph> if queries fail for this reason. See
+            <xref href="impala_disable_codegen.xml#disable_codegen"/> for details.
+          </p>
+        </li>
+
+        <li>
+          <p>
+            If practical, rewrite <codeph>UNION</codeph> queries to use the <codeph>UNION ALL</codeph> operator
+            instead. <ph conref="../shared/impala_common.xml#common/union_all_vs_union"/>
+          </p>
+        </li>
+      </ul>
+    </conbody>
+  </concept>
+
+  <concept id="porting_next">
+
+    <title>Next Porting Steps after Verifying Syntax and Semantics</title>
+
+    <conbody>
+
+      <p>
+        Throughout this section, some of the decisions you make during the porting process also have a substantial
+        impact on performance. After your SQL code is ported and working correctly, doublecheck the
+        performance-related aspects of your schema design, physical layout, and queries to make sure that the
+        ported application is taking full advantage of Impala's parallelism, performance-related SQL features, and
+        integration with Hadoop components.
+      </p>
+
+      <ul>
+        <li>
+          Have you run the <codeph>COMPUTE STATS</codeph> statement on each table involved in join queries? Have
+          you also run <codeph>COMPUTE STATS</codeph> for each table used as the source table in an <codeph>INSERT
+          ... SELECT</codeph> or <codeph>CREATE TABLE AS SELECT</codeph> statement?
+        </li>
+
+        <li>
+          Are you using the most efficient file format for your data volumes, table structure, and query
+          characteristics?
+        </li>
+
+        <li>
+          Are you using partitioning effectively? That is, have you partitioned on columns that are often used for
+          filtering in <codeph>WHERE</codeph> clauses? Have you partitioned at the right granularity so that there
+          is enough data in each partition to parallelize the work for each query?
+        </li>
+
+        <li>
+          Does your ETL process produce a relatively small number of multi-megabyte data files (good) rather than a
+          huge number of small files (bad)?
+        </li>
+      </ul>
+
+      <p>
+        See <xref href="impala_performance.xml#performance"/> for details about the whole performance tuning
+        process.
+      </p>
+    </conbody>
+  </concept>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_query_options.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_query_options.xml b/docs/topics/impala_query_options.xml
new file mode 100644
index 0000000..1011746
--- /dev/null
+++ b/docs/topics/impala_query_options.xml
@@ -0,0 +1,75 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="query_options">
+
+  <title>Query Options for the SET Statement</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="impala-shell"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Configuring"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      You can specify the following options using the <codeph>SET</codeph> statement, and those settings affect all
+      queries issued from that session.
+    </p>
+
+    <p>
+      Some query options are useful in day-to-day operations for improving usability, performance, or flexibility.
+    </p>
+
+    <p>
+      Other query options control special-purpose aspects of Impala operation and are intended primarily for
+      advanced debugging or troubleshooting.
+    </p>
+
+    <p>
+      Options with Boolean parameters can be set to 1 or <codeph>true</codeph> to enable, or 0 or <codeph>false</codeph>
+      to turn off.
+    </p>
+
+    <note rev="2.0.0">
+      In Impala 2.0 and later, you can set query options directly through the JDBC and ODBC interfaces by using the
+      <codeph>SET</codeph> statement. Formerly, <codeph>SET</codeph> was only available as a command within the
+      <cmdname>impala-shell</cmdname> interpreter.
+    </note>
+
+<!-- This is the list including defaults from the pre-release 1.2 impala-shell:
+	ABORT_ON_DEFAULT_LIMIT_EXCEEDED: 0
+	ABORT_ON_ERROR: 0
+	ALLOW_UNSUPPORTED_FORMATS: 0
+	BATCH_SIZE: 0
+	DEBUG_ACTION:
+	DEFAULT_ORDER_BY_LIMIT: -1
+	DISABLE_CODEGEN: 0
+	HBASE_CACHE_BLOCKS: 0
+	HBASE_CACHING: 0
+	MAX_ERRORS: 0
+	MAX_IO_BUFFERS: 0
+	MAX_SCAN_RANGE_LENGTH: 0
+	MEM_LIMIT: 0
+	NUM_NODES: 0
+	NUM_SCANNER_THREADS: 0
+	PARQUET_COMPRESSION_CODEC: SNAPPY
+	PARQUET_FILE_SIZE: 0
+	SUPPORT_START_OVER: false
+-->
+
+    <p outputclass="toc"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_set.xml#set"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_query_timeout_s.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_query_timeout_s.xml b/docs/topics/impala_query_timeout_s.xml
new file mode 100644
index 0000000..41f2918
--- /dev/null
+++ b/docs/topics/impala_query_timeout_s.xml
@@ -0,0 +1,51 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="2.0.0" id="query_timeout_s">
+
+  <title>QUERY_TIMEOUT_S Query Option</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Querying"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">QUERY_TIMEOUT_S query option</indexterm>
+      Sets the idle query timeout value for the session, in seconds. Queries that sit idle for longer than the
+      timeout value are automatically cancelled. If the system administrator specified the
+      <codeph>--idle_query_timeout</codeph> startup option, <codeph>QUERY_TIMEOUT_S</codeph> must be smaller than
+      or equal to the <codeph>--idle_query_timeout</codeph> value.
+    </p>
+
+    <note conref="../shared/impala_common.xml#common/timeout_clock_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>SET QUERY_TIMEOUT_S=<varname>seconds</varname>;</codeblock>
+
+<!-- Don't have a compelling example to show at this time because the 'idle' aspect only applies
+     when the client is careless and leaves the query open. Can't easily demonstrate in impala-shell.
+
+     <p conref="/Content/impala_common_xi44078.xml#common/example_blurb"/>
+-->
+
+    <p>
+      <b>Type:</b> numeric
+    </p>
+
+    <p>
+      <b>Default:</b> 0 (no timeout if <codeph>--idle_query_timeout</codeph> not in effect; otherwise, use
+      <codeph>--idle_query_timeout</codeph> value)
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_timeouts.xml#timeouts"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_real.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_real.xml b/docs/topics/impala_real.xml
new file mode 100644
index 0000000..e6430e3
--- /dev/null
+++ b/docs/topics/impala_real.xml
@@ -0,0 +1,46 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="real">
+
+  <title>REAL Data Type</title>
+  <titlealts><navtitle>REAL</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Data Types"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Schemas"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      An alias for the <codeph>DOUBLE</codeph> data type. See <xref href="impala_double.xml#double"/> for details.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      These examples show how you can use the type names <codeph>REAL</codeph> and <codeph>DOUBLE</codeph>
+      interchangeably, and behind the scenes Impala treats them always as <codeph>DOUBLE</codeph>.
+    </p>
+
+<codeblock>[localhost:21000] &gt; create table r1 (x real);
+[localhost:21000] &gt; describe r1;
++------+--------+---------+
+| name | type   | comment |
++------+--------+---------+
+| x    | double |         |
++------+--------+---------+
+[localhost:21000] &gt; insert into r1 values (1.5), (cast (2.2 as double));
+[localhost:21000] &gt; select cast (1e6 as real);
++---------------------------+
+| cast(1000000.0 as double) |
++---------------------------+
+| 1000000                   |
++---------------------------+</codeblock>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_refresh.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_refresh.xml b/docs/topics/impala_refresh.xml
new file mode 100644
index 0000000..ee022d5
--- /dev/null
+++ b/docs/topics/impala_refresh.xml
@@ -0,0 +1,234 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="refresh">
+
+  <title>REFRESH Statement</title>
+  <titlealts><navtitle>REFRESH</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="DDL"/>
+      <data name="Category" value="Tables"/>
+      <data name="Category" value="Hive"/>
+      <data name="Category" value="Metastore"/>
+      <data name="Category" value="ETL"/>
+      <data name="Category" value="Ingest"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">REFRESH statement</indexterm>
+      To accurately respond to queries, the Impala node that acts as the coordinator (the node to which you are
+      connected through <cmdname>impala-shell</cmdname>, JDBC, or ODBC) must have current metadata about those
+      databases and tables that are referenced in Impala queries. If you are not familiar with the way Impala uses
+      metadata and how it shares the same metastore database as Hive, see
+      <xref href="impala_hadoop.xml#intro_metastore"/> for background information.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>REFRESH [<varname>db_name</varname>.]<varname>table_name</varname></codeblock>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      Use the <codeph>REFRESH</codeph> statement to load the latest metastore metadata and block location data for
+      a particular table in these scenarios:
+    </p>
+
+    <ul>
+      <li>
+        After loading new data files into the HDFS data directory for the table. (Once you have set up an ETL
+        pipeline to bring data into Impala on a regular basis, this is typically the most frequent reason why
+        metadata needs to be refreshed.)
+      </li>
+
+      <li>
+        After issuing <codeph>ALTER TABLE</codeph>, <codeph>INSERT</codeph>, <codeph>LOAD DATA</codeph>, or other
+        table-modifying SQL statement in Hive.
+      </li>
+    </ul>
+
+    <p>
+      You only need to issue the <codeph>REFRESH</codeph> statement on the node to which you connect to issue
+      queries. The coordinator node divides the work among all the Impala nodes in a cluster, and sends read
+      requests for the correct HDFS blocks without relying on the metadata on the other nodes.
+    </p>
+
+    <p>
+      <codeph>REFRESH</codeph> reloads the metadata for the table from the metastore database, and does an
+      incremental reload of the low-level block location data to account for any new data files added to the HDFS
+      data directory for the table. It is a low-overhead, single-table operation, specifically tuned for the common
+      scenario where new data files are added to HDFS.
+    </p>
+
+    <p>
+      Only the metadata for the specified table is flushed. The table must already exist and be known to Impala,
+      either because the <codeph>CREATE TABLE</codeph> statement was run in Impala rather than Hive, or because a
+      previous <codeph>INVALIDATE METADATA</codeph> statement caused Impala to reload its entire metadata catalog.
+    </p>
+
+    <note>
+      <p rev="1.2">
+        In Impala 1.2 and higher, the catalog service broadcasts any changed metadata as a result of Impala
+        <codeph>ALTER TABLE</codeph>, <codeph>INSERT</codeph> and <codeph>LOAD DATA</codeph> statements to all
+        Impala nodes. Thus, the <codeph>REFRESH</codeph> statement is only required if you load data through Hive
+        or by manipulating data files in HDFS directly. See <xref href="impala_components.xml#intro_catalogd"/> for
+        more information on the catalog service.
+      </p>
+      <p rev="1.2.1">
+        In Impala 1.2.1 and higher, another way to avoid inconsistency across nodes is to enable the
+        <codeph>SYNC_DDL</codeph> query option before performing a DDL statement or an <codeph>INSERT</codeph> or
+        <codeph>LOAD DATA</codeph>.
+      </p>
+      <p>
+        The functionality of the <codeph>REFRESH</codeph> statement has changed in Impala 1.1 and higher. Now the
+        table name is a required parameter. To flush the metadata for all tables, use the
+        <codeph><xref href="impala_invalidate_metadata.xml#invalidate_metadata">INVALIDATE METADATA</xref></codeph>
+        command.
+      </p>
+      <draft-comment translate="no"> Almost-identical wording here, under INVALIDATE METADATA, and in Release Notes :: New Features. Makes sense to conref. </draft-comment>
+      <p>
+        Because <codeph>REFRESH <varname>table_name</varname></codeph> only works for tables that Impala is already
+        aware of, when you create a new table in the Hive shell, you must enter <codeph>INVALIDATE
+        METADATA</codeph> with no table parameter before you can see the new table in
+        <cmdname>impala-shell</cmdname>. Once the table is known to Impala, you can issue <codeph>REFRESH
+        <varname>table_name</varname></codeph> as needed after you add more data files for that table.
+      </p>
+    </note>
+
+    <p conref="../shared/impala_common.xml#common/refresh_vs_invalidate"/>
+
+    <p>
+      A metadata update for an <codeph>impalad</codeph> instance <b>is</b> required if:
+    </p>
+
+    <ul>
+      <li>
+        A metadata change occurs.
+      </li>
+
+      <li>
+        <b>and</b> the change is made through Hive.
+      </li>
+
+      <li>
+        <b>and</b> the change is made to a database to which clients such as the Impala shell or ODBC directly
+        connect.
+      </li>
+    </ul>
+
+    <p rev="1.2">
+      A metadata update for an Impala node is <b>not</b> required after you run <codeph>ALTER TABLE</codeph>,
+      <codeph>INSERT</codeph>, or other table-modifying statement in Impala rather than Hive. Impala handles the
+      metadata synchronization automatically through the catalog service.
+    </p>
+
+    <p>
+      Database and table metadata is typically modified by:
+    </p>
+
+    <ul>
+      <li>
+        Hive - through <codeph>ALTER</codeph>, <codeph>CREATE</codeph>, <codeph>DROP</codeph> or
+        <codeph>INSERT</codeph> operations.
+      </li>
+
+      <li>
+        Impalad - through <codeph>CREATE TABLE</codeph>, <codeph>ALTER TABLE</codeph>, and <codeph>INSERT</codeph>
+        operations. <ph rev="1.2">In Impala 1.2 and higher, such changes are propagated to all Impala nodes by the
+        Impala catalog service.</ph>
+      </li>
+    </ul>
+
+    <p>
+      <codeph>REFRESH</codeph> causes the metadata for that table to be immediately reloaded. For a huge table,
+      that process could take a noticeable amount of time; but doing the refresh up front avoids an unpredictable
+      delay later, for example if the next reference to the table is during a benchmark test.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/sync_ddl_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      The following example shows how you might use the <codeph>REFRESH</codeph> statement after manually adding
+      new HDFS data files to the Impala data directory for a table:
+    </p>
+
+<codeblock>[impalad-host:21000] &gt; refresh t1;
+[impalad-host:21000] &gt; refresh t2;
+[impalad-host:21000] &gt; select * from t1;
+...
+[impalad-host:21000] &gt; select * from t2;
+... </codeblock>
+
+    <p>
+      For more examples of using <codeph>REFRESH</codeph> and <codeph>INVALIDATE METADATA</codeph> with a
+      combination of Impala and Hive operations, see <xref href="impala_tutorial.xml#tutorial_impala_hive"/>.
+    </p>
+
+    <p>
+      <b>Related impalad options:</b>
+    </p>
+
+    <p>
+      In Impala 1.0, the <codeph>-r</codeph> option of <cmdname>impala-shell</cmdname> issued
+      <codeph>REFRESH</codeph> to reload metadata for all tables.
+    </p>
+
+    <p>
+      In Impala 1.1 and higher, this option issues <codeph>INVALIDATE METADATA</codeph> because
+      <codeph>REFRESH</codeph> now requires a table name parameter. Due to the expense of reloading the metadata
+      for all tables, the <cmdname>impala-shell</cmdname> <codeph>-r</codeph> option is not recommended for
+      day-to-day use in a production environment.
+    </p>
+
+    <p rev="1.2">
+      In Impala 1.2 and higher, the <codeph>-r</codeph> option is needed even less frequently, because metadata
+      changes caused by SQL statements in Impala are automatically broadcast to all Impala nodes.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb"/>
+    <p rev="CDH-19187">
+      The user ID that the <cmdname>impalad</cmdname> daemon runs under,
+      typically the <codeph>impala</codeph> user, must have execute
+      permissions for all the relevant directories holding table data.
+      (A table could have data spread across multiple directories,
+      or in unexpected paths, if it uses partitioning or
+      specifies a <codeph>LOCATION</codeph> attribute for
+      individual partitions or the entire table.)
+      Issues with permissions might not cause an immediate error for this statement,
+      but subsequent statements such as <codeph>SELECT</codeph>
+      or <codeph>SHOW TABLE STATS</codeph> could fail.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/hdfs_blurb"/>
+
+    <p>
+      The <codeph>REFRESH</codeph> command checks HDFS permissions of the underlying data files and directories,
+      caching this information so that a statement can be cancelled immediately if for example the
+      <codeph>impala</codeph> user does not have permission to write to the data directory for the table. Impala
+      reports any lack of write permissions as an <codeph>INFO</codeph> message in the log file, in case that
+      represents an oversight. If you change HDFS permissions to make data readable or writeable by the Impala
+      user, issue another <codeph>REFRESH</codeph> to make Impala aware of the change.
+    </p>
+
+    <note conref="../shared/impala_common.xml#common/compute_stats_next"/>
+
+    <p conref="../shared/impala_common.xml#common/s3_blurb"/>
+    <p conref="../shared/impala_common.xml#common/s3_metadata"/>
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+    <p>
+      <xref href="impala_hadoop.xml#intro_metastore"/>,
+      <xref href="impala_invalidate_metadata.xml#invalidate_metadata"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_request_pool.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_request_pool.xml b/docs/topics/impala_request_pool.xml
new file mode 100644
index 0000000..cf2a811
--- /dev/null
+++ b/docs/topics/impala_request_pool.xml
@@ -0,0 +1,45 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.3.0" id="request_pool">
+
+  <title>REQUEST_POOL Query Option</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Resource Management"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Admission Control"/>
+      <data name="Category" value="YARN"/>
+      <data name="Category" value="Llama"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">REQUEST_POOL query option</indexterm>
+      The pool or queue name that queries should be submitted to. Only applies when you enable the Impala admission
+      control feature (CDH 4 or CDH 5; see <xref href="impala_admission.xml#admission_control"/>), or the YARN
+      resource management feature (CDH 5 only; see
+      <xref href="impala_resource_management.xml#resource_management"/>). Specifies the name of the pool used by
+      requests from Impala to the resource manager.
+    </p>
+
+    <p>
+      Formerly known as <codeph>YARN_POOL</codeph> during the CDH 5 beta period. Renamed to reflect that it can be
+      used both with YARN and with the lightweight admission control feature introduced in Impala 1.3.
+    </p>
+
+    <p>
+      <b>Type:</b> <codeph>STRING</codeph>
+    </p>
+
+    <p>
+      <b>Default:</b> empty (use the user-to-pool mapping defined by an <cmdname>impalad</cmdname> startup option
+      in the Impala configuration file)
+    </p>
+
+<!-- Worth adding a couple of related info links here. -->
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_reservation_request_timeout.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_reservation_request_timeout.xml b/docs/topics/impala_reservation_request_timeout.xml
new file mode 100644
index 0000000..0316e44
--- /dev/null
+++ b/docs/topics/impala_reservation_request_timeout.xml
@@ -0,0 +1,35 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="1.2" id="reservation_request_timeout">
+
+  <title>RESERVATION_REQUEST_TIMEOUT Query Option (CDH 5 only)</title>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Resource Management"/>
+      <data name="Category" value="YARN"/>
+      <data name="Category" value="Llama"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">RESERVATION_REQUEST_TIMEOUT query option</indexterm>
+      Maximum number of milliseconds Impala will wait for a reservation to be completely granted or denied. Used in
+      conjunction with the Impala resource management feature in Impala 1.2 and higher with CDH 5.
+    </p>
+
+    <p>
+      <b>Type:</b> numeric
+    </p>
+
+    <p>
+      <b>Default:</b> 300000 (5 minutes)
+    </p>
+
+<!-- Worth adding a couple of related info links here. -->
+
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_revoke.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_revoke.xml b/docs/topics/impala_revoke.xml
new file mode 100644
index 0000000..88fbbf9
--- /dev/null
+++ b/docs/topics/impala_revoke.xml
@@ -0,0 +1,96 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="2.0.0" id="revoke">
+
+  <title>REVOKE Statement (CDH 5.2 or higher only)</title>
+  <titlealts><navtitle>REVOKE (CDH 5.2 or higher only)</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="DDL"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Sentry"/>
+      <data name="Category" value="Roles"/>
+      <!-- Consider whether to go deeper into categories like Security for the Sentry-related statements. -->
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">REVOKE statement</indexterm>
+<!-- Copied from Sentry docs. Turn into conref. I did some rewording for clarity. -->
+      The <codeph>REVOKE</codeph> statement revokes roles or privileges on a specified object from groups. Only
+      Sentry administrative users can revoke the role from a group. The revocation has a cascading effect. For
+      example, revoking the <codeph>ALL</codeph> privilege on a database also revokes the same privilege for all
+      the tables in that database.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock rev="2.3.0 collevelauth">REVOKE ROLE <varname>role_name</varname> FROM GROUP <varname>group_name</varname>
+
+REVOKE <varname>privilege</varname> ON <varname>object_type</varname> <varname>object_name</varname>
+  FROM [ROLE] <varname>role_name</varname>
+
+<ph rev="2.3.0">privilege ::= SELECT | SELECT(<varname>column_name</varname>) | INSERT | ALL</ph>
+object_type ::= TABLE | DATABASE | SERVER | URI
+</codeblock>
+
+    <p>
+      Typically, the object name is an identifier. For URIs, it is a string literal.
+    </p>
+
+    <p rev="2.3.0 collevelauth">
+      The ability to grant or revoke <codeph>SELECT</codeph> privilege on specific columns is available
+      in CDH 5.5 / Impala 2.3 and higher. See <xref href="sg_hive_sql.xml#concept_c2q_4qx_p4/col_level_auth_sentry"/>
+      for details.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/privileges_blurb"/>
+
+    <p>
+      Only administrative users (those with <codeph>ALL</codeph> privileges on the server, defined in the Sentry
+      policy file) can use this statement.
+    </p>
+
+<!-- Turn compatibility info into a conref or series of conrefs. (In both GRANT and REVOKE.) -->
+
+    <p conref="../shared/impala_common.xml#common/compatibility_blurb"/>
+
+    <p>
+      <ul>
+        <li>
+          The Impala <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements are available in CDH 5.2 and
+          higher.
+        </li>
+
+        <li>
+          In CDH 5.1 and higher, Impala makes use of any roles and privileges specified by the
+          <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements in Hive, when your system is configured to
+          use the Sentry service instead of the file-based policy mechanism.
+        </li>
+
+        <li>
+          The Impala <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements do not require the
+          <codeph>ROLE</codeph> keyword to be repeated before each role name, unlike the equivalent Hive
+          statements.
+        </li>
+
+        <li conref="../shared/impala_common.xml#common/grant_revoke_single"/>
+      </ul>
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      <xref href="impala_authorization.xml#authorization"/>, <xref href="impala_grant.xml#grant"/>
+      <xref href="impala_create_role.xml#create_role"/>, <xref href="impala_drop_role.xml#drop_role"/>,
+      <xref href="impala_show.xml#show"/>
+    </p>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_schema_objects.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_schema_objects.xml b/docs/topics/impala_schema_objects.xml
new file mode 100644
index 0000000..d8abe12
--- /dev/null
+++ b/docs/topics/impala_schema_objects.xml
@@ -0,0 +1,57 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="schema_objects">
+
+  <title>Impala Schema Objects and Object Names</title>
+  <titlealts><navtitle>Schema Objects and Object Names</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Schemas"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">schema objects</indexterm>
+      With Impala, you work with schema objects that are familiar to database users: primarily databases, tables, views,
+      and functions. The SQL syntax to work with these objects is explained in
+      <xref href="impala_langref_sql.xml#langref_sql"/>. This section explains the conceptual knowledge you need to
+      work with these objects and the various ways to specify their names.
+    </p>
+
+    <p>
+      Within a table, partitions can also be considered a kind of object. Partitioning is an important subject for
+      Impala, with its own documentation section covering use cases and performance considerations. See
+      <xref href="impala_partitioning.xml#partitioning"/> for details.
+    </p>
+
+    <p>
+      Impala does not have a counterpart of the <q>tablespace</q> notion from some database systems. By default,
+      all the data files for a database, table, or partition are located within nested folders within the HDFS file
+      system. You can also specify a particular HDFS location for a given Impala table or partition. The raw data
+      for these objects is represented as a collection of data files, providing the flexibility to load data by
+      simply moving files into the expected HDFS location.
+    </p>
+
+    <p>
+      Information about the schema objects is held in the
+      <xref href="impala_hadoop.xml#intro_metastore">metastore</xref> database. This database is shared between
+      Impala and Hive, allowing each to create, drop, and query each other's databases, tables, and so on. When
+      Impala makes a change to schema objects through a <codeph>CREATE</codeph>, <codeph>ALTER</codeph>,
+      <codeph>DROP</codeph>, <codeph>INSERT</codeph>, or <codeph>LOAD DATA</codeph> statement, it broadcasts those
+      changes to all nodes in the cluster through the <xref href="impala_components.xml#intro_catalogd">catalog
+      service</xref>. When you make such changes through Hive or directly through manipulating HDFS files, you use
+      the <xref href="impala_refresh.xml#refresh">REFRESH</xref> or
+      <xref href="impala_invalidate_metadata.xml#invalidate_metadata">INVALIDATE METADATA</xref> statements on the
+      Impala side to recognize the newly loaded data, new tables, and so on.
+    </p>
+
+    <p outputclass="toc"/>
+  </conbody>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_select.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_select.xml b/docs/topics/impala_select.xml
new file mode 100644
index 0000000..db63f71
--- /dev/null
+++ b/docs/topics/impala_select.xml
@@ -0,0 +1,203 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="select">
+
+  <title>SELECT Statement</title>
+  <titlealts><navtitle>SELECT</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Reports"/>
+      <data name="Category" value="Tables"/>
+      <data name="Category" value="Data Analysts"/>
+      <data name="Category" value="Developers"/>
+      <!-- This is such an important statement, think if there are more applicable categories. -->
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">SELECT statement</indexterm>
+      The <codeph>SELECT</codeph> statement performs queries, retrieving data from one or more tables and producing
+      result sets consisting of rows and columns.
+    </p>
+
+    <p>
+      The Impala <codeph><xref href="impala_insert.xml#insert">INSERT</xref></codeph> statement also typically ends
+      with a <codeph>SELECT</codeph> statement, to define data to copy from one table to another.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>[WITH <i>name</i> AS (<i>select_expression</i>) [, ...] ]
+SELECT
+  [ALL | DISTINCT]
+  [STRAIGHT_JOIN]
+  <i>expression</i> [, <i>expression</i> ...]
+FROM <i>table_reference</i> [, <i>table_reference</i> ...]
+[[FULL | [LEFT | RIGHT] INNER | [LEFT | RIGHT] OUTER | [LEFT | RIGHT] SEMI | [LEFT | RIGHT] ANTI | CROSS]
+  JOIN <i>table_reference</i>
+  [ON <i>join_equality_clauses</i> | USING (<varname>col1</varname>[, <varname>col2</varname> ...]] ...
+WHERE <i>conditions</i>
+GROUP BY { <i>column</i> | <i>expression</i> [ASC | DESC] [NULLS FIRST | NULLS LAST] [, ...] }
+HAVING <codeph>conditions</codeph>
+GROUP BY { <i>column</i> | <i>expression</i> [ASC | DESC] [, ...] }
+LIMIT <i>expression</i> [OFFSET <i>expression</i>]
+[UNION [ALL] <i>select_statement</i>] ...]
+</codeblock>
+
+    <p>
+      Impala <codeph>SELECT</codeph> queries support:
+    </p>
+
+    <ul>
+      <li>
+        SQL scalar data types: <codeph><xref href="impala_boolean.xml#boolean">BOOLEAN</xref></codeph>,
+        <codeph><xref href="impala_tinyint.xml#tinyint">TINYINT</xref></codeph>,
+        <codeph><xref href="impala_smallint.xml#smallint">SMALLINT</xref></codeph>,
+        <codeph><xref href="impala_int.xml#int">INT</xref></codeph>,
+        <codeph><xref href="impala_bigint.xml#bigint">BIGINT</xref></codeph>,
+        <codeph><xref href="impala_decimal.xml#decimal">DECIMAL</xref></codeph>
+        <codeph><xref href="impala_float.xml#float">FLOAT</xref></codeph>,
+        <codeph><xref href="impala_double.xml#double">DOUBLE</xref></codeph>,
+        <codeph><xref href="impala_timestamp.xml#timestamp">TIMESTAMP</xref></codeph>,
+        <codeph><xref href="impala_string.xml#string">STRING</xref></codeph>,
+        <codeph><xref href="impala_varchar.xml#varchar">VARCHAR</xref></codeph>,
+        <codeph><xref href="impala_char.xml#char">CHAR</xref></codeph>.
+      </li>
+
+<!-- To do: Consider promoting 'querying complex types' to its own subtopic or pseudo-heading. -->
+      <li rev="2.3.0">
+        The complex data types <codeph>ARRAY</codeph>, <codeph>STRUCT</codeph>, and <codeph>MAP</codeph>,
+        are available in CDH 5.5 / Impala 2.3 and higher.
+        Queries involving these types typically involve special qualified names
+        using dot notation for referring to the complex column fields,
+        and join clauses for bringing the complex columns into the result set.
+        See <xref href="impala_complex_types.xml#complex_types"/> for details.
+      </li>
+
+      <li rev="1.1">
+        An optional <xref href="impala_with.xml#with"><codeph>WITH</codeph> clause</xref> before the
+        <codeph>SELECT</codeph> keyword, to define a subquery whose name or column names can be referenced from
+        later in the main query. This clause lets you abstract repeated clauses, such as aggregation functions,
+        that are referenced multiple times in the same query.
+      </li>
+
+      <li>
+        By default, one <codeph>DISTINCT</codeph> clause per query. See <xref href="impala_distinct.xml#distinct"/>
+        for details. See <xref href="impala_appx_count_distinct.xml#appx_count_distinct"/> for a query option to
+        allow multiple <codeph>COUNT(DISTINCT)</codeph> impressions in the same query.
+      </li>
+
+      <li>
+        Subqueries in a <codeph>FROM</codeph> clause. In CDH 5.2 / Impala 2.0 and higher,
+        subqueries can also go in the <codeph>WHERE</codeph> clause, for example with the
+        <codeph>IN()</codeph>, <codeph>EXISTS</codeph>, and <codeph>NOT EXISTS</codeph> operators.
+      </li>
+
+      <li>
+        <codeph>WHERE</codeph>, <codeph>GROUP BY</codeph>, <codeph>HAVING</codeph> clauses.
+      </li>
+
+      <li rev="obwl">
+        <codeph><xref href="impala_order_by.xml#order_by">ORDER BY</xref></codeph>. Prior to Impala 1.4.0, Impala
+        required that queries using an <codeph>ORDER BY</codeph> clause also include a
+        <codeph><xref href="impala_limit.xml#limit">LIMIT</xref></codeph> clause. In Impala 1.4.0 and higher, this
+        restriction is lifted; sort operations that would exceed the Impala memory limit automatically use a
+        temporary disk work area to perform the sort.
+      </li>
+
+      <li>
+        <p conref="../shared/impala_common.xml#common/join_types"/>
+        <p>
+          See <xref href="impala_joins.xml#joins"/> for details and examples of join queries.
+        </p>
+      </li>
+
+      <li>
+        <codeph>UNION ALL</codeph>.
+      </li>
+
+      <li>
+        <codeph>LIMIT</codeph>.
+      </li>
+
+      <li>
+        External tables.
+      </li>
+
+      <li>
+        Relational operators such as greater than, less than, or equal to.
+      </li>
+
+      <li>
+        Arithmetic operators such as addition or subtraction.
+      </li>
+
+      <li>
+        Logical/Boolean operators <codeph>AND</codeph>, <codeph>OR</codeph>, and <codeph>NOT</codeph>. Impala does
+        not support the corresponding symbols <codeph>&amp;&amp;</codeph>, <codeph>||</codeph>, and
+        <codeph>!</codeph>.
+      </li>
+
+      <li>
+        Common SQL built-in functions such as <codeph>COUNT</codeph>, <codeph>SUM</codeph>, <codeph>CAST</codeph>,
+        <codeph>LIKE</codeph>, <codeph>IN</codeph>, <codeph>BETWEEN</codeph>, and <codeph>COALESCE</codeph>. Impala
+        specifically supports built-ins described in <xref href="impala_functions.xml#builtins"/>.
+      </li>
+    </ul>
+
+    <p conref="../shared/impala_common.xml#common/ignore_file_extensions"/>
+
+    <p conref="../shared/impala_common.xml#common/security_blurb"/>
+    <p conref="../shared/impala_common.xml#common/redaction_yes"/>
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_yes"/>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb"/>
+    <p rev="CDH-19187">
+      The user ID that the <cmdname>impalad</cmdname> daemon runs under,
+      typically the <codeph>impala</codeph> user, must have read
+      permissions for the files in all applicable directories in all source tables,
+      and read and execute permissions for the relevant data directories.
+      (A <codeph>SELECT</codeph> operation could read files from multiple different HDFS directories
+      if the source table is partitioned.)
+      If a query attempts to read a data file and is unable to because of an HDFS permission error,
+      the query halts and does not return any further results.
+    </p>
+
+    <p outputclass="toc"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      The <codeph>SELECT</codeph> syntax is so extensive that it forms its own category of statements: queries. The
+      other major classifications of SQL statements are data definition language (see
+      <xref href="impala_ddl.xml#ddl"/>) and data manipulation language (see <xref href="impala_dml.xml#dml"/>).
+    </p>
+
+    <p>
+      Because the focus of Impala is on fast queries with interactive response times over huge data sets, query
+      performance and scalability are important considerations. See
+      <xref href="impala_performance.xml#performance"/> and <xref href="impala_scalability.xml#scalability"/> for
+      details.
+    </p>
+  </conbody>
+
+  <concept id="where" audience="Cloudera">
+
+<!-- WHERE hidden for the moment until there's the chance to add some reasonably comprehensive content
+
+     and make it its own file. -->
+
+    <title>WHERE Clause</title>
+
+    <conbody>
+
+      <p/>
+    </conbody>
+  </concept>
+</concept>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/463ddf92/docs/topics/impala_set.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_set.xml b/docs/topics/impala_set.xml
new file mode 100644
index 0000000..afa6777
--- /dev/null
+++ b/docs/topics/impala_set.xml
@@ -0,0 +1,90 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="2.0.0" id="set">
+
+  <title>SET Statement</title>
+  <titlealts><navtitle>SET</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="SQL"/>
+      <data name="Category" value="Querying"/>
+      <data name="Category" value="Configuring"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      <indexterm audience="Cloudera">SET statement</indexterm>
+      Specifies values for query options that control the runtime behavior of other statements within the same
+      session.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>SET [<varname>query_option</varname>=<varname>option_value</varname>]
+</codeblock>
+
+    <p>
+      <codeph>SET</codeph> with no arguments returns a result set consisting of all available query options and
+      their current values.
+    </p>
+
+    <p>
+      The query option name and any string argument values are case-insensitive.
+    </p>
+
+    <p>
+      Each query option has a specific allowed notation for its arguments. Boolean options can be enabled and
+      disabled by assigning values of either <codeph>true</codeph> and <codeph>false</codeph>, or
+      <codeph>1</codeph> and <codeph>0</codeph>. Some numeric options accept a final character signifying the unit,
+      such as <codeph>2g</codeph> for 2 gigabytes or <codeph>100m</codeph> for 100 megabytes. See
+      <xref href="impala_query_options.xml#query_options"/> for the details of each query option.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
+
+    <p>
+      <codeph>MEM_LIMIT</codeph> is probably the most commonly used query option. You can specify a high value to
+      allow a resource-intensive query to complete. For testing how queries would work on memory-constrained
+      systems, you might specify an artificially low value.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
+
+    <p conref="../shared/impala_common.xml#common/example_blurb"/>
+
+    <p>
+      The following example sets some numeric and some Boolean query options to control usage of memory, disk
+      space, and timeout periods, then runs a query whose success could depend on the options in effect:
+    </p>
+
+<codeblock>set mem_limit=64g;
+set DISABLE_UNSAFE_SPILLS=true;
+set parquet_file_size=400m;
+set RESERVATION_REQUEST_TIMEOUT=900000;
+insert overwrite parquet_table select c1, c2, count(c3) from text_table group by c1, c2, c3;
+</codeblock>
+
+    <p conref="../shared/impala_common.xml#common/added_in_20"/>
+
+    <p>
+      <codeph>SET</codeph> has always been available as an <cmdname>impala-shell</cmdname> command. Promoting it to
+      a SQL statement lets you use this feature in client applications through the JDBC and ODBC APIs.
+    </p>
+
+<!-- <p conref="/Content/impala_common_xi44078.xml#common/jdbc_blurb"/> -->
+
+    <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
+
+    <p conref="../shared/impala_common.xml#common/related_info"/>
+
+    <p>
+      See <xref href="impala_query_options.xml#query_options"/> for the query options you can adjust using this
+      statement.
+    </p>
+  </conbody>
+</concept>