You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by mi...@apache.org on 2018/05/09 21:10:21 UTC

[12/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_planning.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_planning.html b/docs/build3x/html/topics/impala_planning.html
new file mode 100644
index 0000000..e571e42
--- /dev/null
+++ b/docs/build3x/html/topics/impala_planning.html
@@ -0,0 +1,20 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_prereqs.html#prereqs"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_design.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="planning"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Planning for Impala Deployment</title></head><body id="planning"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Planning for Impala Deployment</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Before you set up Impala in production, do some planning to make sure that your hardware setup has sufficient
+      capacity, that your cluster topology is optimal for Impala queries, and that your schema design and ETL
+      processes follow the best practices for Impala.
+    </p>
+
+    <p class="p toc"></p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_prereqs.html#prereqs">Impala Requirements</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_schema_design.html">Guidelines for Designing Impala Schemas</a></strong><br></li></ul></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_porting.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_porting.html b/docs/build3x/html/topics/impala_porting.html
new file mode 100644
index 0000000..8a8ba7e
--- /dev/null
+++ b/docs/build3x/html/topics/impala_porting.html
@@ -0,0 +1,603 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="porting"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Porting SQL from Other Database Systems to Impala</title></head><body id="porting"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Porting SQL from Other Database Systems to Impala</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Although Impala uses standard SQL for queries, you might need to modify SQL source when bringing applications
+      to Impala, due to variations in data types, built-in functions, vendor language extensions, and
+      Hadoop-specific syntax. Even when SQL is working correctly, you might make further minor modifications for
+      best performance.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="porting__porting_ddl_dml">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Porting DDL and DML Statements</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        When adapting SQL code from a traditional database system to Impala, expect to find a number of differences
+        in the DDL statements that you use to set up the schema. Clauses related to physical layout of files,
+        tablespaces, and indexes have no equivalent in Impala. You might restructure your schema considerably to
+        account for the Impala partitioning scheme and Hadoop file formats.
+      </p>
+
+      <p class="p">
+        Expect SQL queries to have a much higher degree of compatibility. With modest rewriting to address vendor
+        extensions and features not yet supported in Impala, you might be able to run identical or almost-identical
+        query text on both systems.
+      </p>
+
+      <p class="p">
+        Therefore, consider separating out the DDL into a separate Impala-specific setup script. Focus your reuse
+        and ongoing tuning efforts on the code for SQL queries.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="porting__porting_data_types">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Porting Data Types from Other Database Systems</h2>
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Change any <code class="ph codeph">VARCHAR</code>, <code class="ph codeph">VARCHAR2</code>, and <code class="ph codeph">CHAR</code> columns to
+            <code class="ph codeph">STRING</code>. Remove any length constraints from the column declarations; for example,
+            change <code class="ph codeph">VARCHAR(32)</code> or <code class="ph codeph">CHAR(1)</code> to <code class="ph codeph">STRING</code>. Impala is
+            very flexible about the length of string values; it does not impose any length constraints
+            or do any special processing (such as blank-padding) for <code class="ph codeph">STRING</code> columns.
+            (In Impala 2.0 and higher, there are data types <code class="ph codeph">VARCHAR</code> and <code class="ph codeph">CHAR</code>,
+            with length constraints for both types and blank-padding for <code class="ph codeph">CHAR</code>.
+            However, for performance reasons, it is still preferable to use <code class="ph codeph">STRING</code>
+            columns where practical.)
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            For national language character types such as <code class="ph codeph">NCHAR</code>, <code class="ph codeph">NVARCHAR</code>, or
+            <code class="ph codeph">NCLOB</code>, be aware that while Impala can store and query UTF-8 character data, currently
+            some string manipulation operations only work correctly with ASCII data. See
+            <a class="xref" href="impala_string.html#string">STRING Data Type</a> for details.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Change any <code class="ph codeph">DATE</code>, <code class="ph codeph">DATETIME</code>, or <code class="ph codeph">TIME</code> columns to
+            <code class="ph codeph">TIMESTAMP</code>. Remove any precision constraints. Remove any timezone clauses, and make
+            sure your application logic or ETL process accounts for the fact that Impala expects all
+            <code class="ph codeph">TIMESTAMP</code> values to be in
+            <a class="xref" href="http://en.wikipedia.org/wiki/Coordinated_Universal_Time" target="_blank">Coordinated
+            Universal Time (UTC)</a>. See <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for information about
+            the <code class="ph codeph">TIMESTAMP</code> data type, and
+            <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for conversion functions for different
+            date and time formats.
+          </p>
+          <p class="p">
+            You might also need to adapt date- and time-related literal values and format strings to use the
+            supported Impala date and time formats. If you have date and time literals with different separators or
+            different numbers of <code class="ph codeph">YY</code>, <code class="ph codeph">MM</code>, and so on placeholders than Impala
+            expects, consider using calls to <code class="ph codeph">regexp_replace()</code> to transform those values to the
+            Impala-compatible format. See <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for information about the
+            allowed formats for date and time literals, and
+            <a class="xref" href="impala_string_functions.html#string_functions">Impala String Functions</a> for string conversion functions such as
+            <code class="ph codeph">regexp_replace()</code>.
+          </p>
+          <p class="p">
+            Instead of <code class="ph codeph">SYSDATE</code>, call the function <code class="ph codeph">NOW()</code>.
+          </p>
+          <p class="p">
+            Instead of adding or subtracting directly from a date value to produce a value <var class="keyword varname">N</var>
+            days in the past or future, use an <code class="ph codeph">INTERVAL</code> expression, for example <code class="ph codeph">NOW() +
+            INTERVAL 30 DAYS</code>.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Although Impala supports <code class="ph codeph">INTERVAL</code> expressions for datetime arithmetic, as shown in
+            <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a>, <code class="ph codeph">INTERVAL</code> is not available as a column
+            data type in Impala. For any <code class="ph codeph">INTERVAL</code> values stored in tables, convert them to numeric
+            values that you can add or subtract using the functions in
+            <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a>. For example, if you had a table
+            <code class="ph codeph">DEADLINES</code> with an <code class="ph codeph">INT</code> column <code class="ph codeph">TIME_PERIOD</code>, you could
+            construct dates N days in the future like so:
+          </p>
+<pre class="pre codeblock"><code>SELECT NOW() + INTERVAL time_period DAYS from deadlines;</code></pre>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            For <code class="ph codeph">YEAR</code> columns, change to the smallest Impala integer type that has sufficient
+            range. See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details about ranges, casting, and so on
+            for the various numeric data types.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Change any <code class="ph codeph">DECIMAL</code> and <code class="ph codeph">NUMBER</code> types. If fixed-point precision is not
+            required, you can use <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code> on the Impala side depending on
+            the range of values. For applications that require precise decimal values, such as financial data, you
+            might need to make more extensive changes to table structure and application logic, such as using
+            separate integer columns for dollars and cents, or encoding numbers as string values and writing UDFs
+            to manipulate them. See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details about ranges,
+            casting, and so on for the various numeric data types.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">FLOAT</code>, <code class="ph codeph">DOUBLE</code>, and <code class="ph codeph">REAL</code> types are supported in
+            Impala. Remove any precision and scale specifications. (In Impala, <code class="ph codeph">REAL</code> is just an
+            alias for <code class="ph codeph">DOUBLE</code>; columns declared as <code class="ph codeph">REAL</code> are turned into
+            <code class="ph codeph">DOUBLE</code> behind the scenes.) See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for
+            details about ranges, casting, and so on for the various numeric data types.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Most integer types from other systems have equivalents in Impala, perhaps under different names such as
+            <code class="ph codeph">BIGINT</code> instead of <code class="ph codeph">INT8</code>. For any that are unavailable, for example
+            <code class="ph codeph">MEDIUMINT</code>, switch to the smallest Impala integer type that has sufficient range.
+            Remove any precision specifications. See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details
+            about ranges, casting, and so on for the various numeric data types.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Remove any <code class="ph codeph">UNSIGNED</code> constraints. All Impala numeric types are signed. See
+            <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details about ranges, casting, and so on for the
+            various numeric data types.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            For any types holding bitwise values, use an integer type with enough range to hold all the relevant
+            bits within a positive integer. See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details about
+            ranges, casting, and so on for the various numeric data types.
+          </p>
+          <p class="p">
+            For example, <code class="ph codeph">TINYINT</code> has a maximum positive value of 127, not 256, so to manipulate
+            8-bit bitfields as positive numbers switch to the next largest type <code class="ph codeph">SMALLINT</code>.
+          </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select cast(127*2 as tinyint);
++--------------------------+
+| cast(127 * 2 as tinyint) |
++--------------------------+
+| -2                       |
++--------------------------+
+[localhost:21000] &gt; select cast(128 as tinyint);
++----------------------+
+| cast(128 as tinyint) |
++----------------------+
+| -128                 |
++----------------------+
+[localhost:21000] &gt; select cast(127*2 as smallint);
++---------------------------+
+| cast(127 * 2 as smallint) |
++---------------------------+
+| 254                       |
++---------------------------+</code></pre>
+          <p class="p">
+            Impala does not support notation such as <code class="ph codeph">b'0101'</code> for bit literals.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            For BLOB values, use <code class="ph codeph">STRING</code> to represent <code class="ph codeph">CLOB</code> or
+            <code class="ph codeph">TEXT</code> types (character based large objects) up to 32 KB in size. Binary large objects
+            such as <code class="ph codeph">BLOB</code>, <code class="ph codeph">RAW</code> <code class="ph codeph">BINARY</code>, and
+            <code class="ph codeph">VARBINARY</code> do not currently have an equivalent in Impala.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            For Boolean-like types such as <code class="ph codeph">BOOL</code>, use the Impala <code class="ph codeph">BOOLEAN</code> type.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Because Impala currently does not support composite or nested types, any spatial data types in other
+            database systems do not have direct equivalents in Impala. You could represent spatial values in string
+            format and write UDFs to process them. See <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a> for details. Where
+            practical, separate spatial types into separate tables so that Impala can still work with the
+            non-spatial data.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Take out any <code class="ph codeph">DEFAULT</code> clauses. Impala can use data files produced from many different
+            sources, such as Pig, Hive, or MapReduce jobs. The fast import mechanisms of <code class="ph codeph">LOAD DATA</code>
+            and external tables mean that Impala is flexible about the format of data files, and Impala does not
+            necessarily validate or cleanse data before querying it. When copying data through Impala
+            <code class="ph codeph">INSERT</code> statements, you can use conditional functions such as <code class="ph codeph">CASE</code> or
+            <code class="ph codeph">NVL</code> to substitute some other value for <code class="ph codeph">NULL</code> fields; see
+            <a class="xref" href="impala_conditional_functions.html#conditional_functions">Impala Conditional Functions</a> for details.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Take out any constraints from your <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code>
+            statements, for example <code class="ph codeph">PRIMARY KEY</code>, <code class="ph codeph">FOREIGN KEY</code>,
+            <code class="ph codeph">UNIQUE</code>, <code class="ph codeph">NOT NULL</code>, <code class="ph codeph">UNSIGNED</code>, or
+            <code class="ph codeph">CHECK</code> constraints. Impala can use data files produced from many different sources,
+            such as Pig, Hive, or MapReduce jobs. Therefore, Impala expects initial data validation to happen
+            earlier during the ETL or ELT cycle. After data is loaded into Impala tables, you can perform queries
+            to test for <code class="ph codeph">NULL</code> values. When copying data through Impala <code class="ph codeph">INSERT</code>
+            statements, you can use conditional functions such as <code class="ph codeph">CASE</code> or <code class="ph codeph">NVL</code> to
+            substitute some other value for <code class="ph codeph">NULL</code> fields; see
+            <a class="xref" href="impala_conditional_functions.html#conditional_functions">Impala Conditional Functions</a> for details.
+          </p>
+          <p class="p">
+            Do as much verification as practical before loading data into Impala. After data is loaded into Impala,
+            you can do further verification using SQL queries to check if values have expected ranges, if values
+            are <code class="ph codeph">NULL</code> or not, and so on. If there is a problem with the data, you will need to
+            re-run earlier stages of the ETL process, or do an <code class="ph codeph">INSERT ... SELECT</code> statement in
+            Impala to copy the faulty data to a new table and transform or filter out the bad values.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Take out any <code class="ph codeph">CREATE INDEX</code>, <code class="ph codeph">DROP INDEX</code>, and <code class="ph codeph">ALTER
+            INDEX</code> statements, and equivalent <code class="ph codeph">ALTER TABLE</code> statements. Remove any
+            <code class="ph codeph">INDEX</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">PRIMARY KEY</code> clauses from
+            <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements. Impala is optimized for bulk
+            read operations for data warehouse-style queries, and therefore does not support indexes for its
+            tables.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Calls to built-in functions with out-of-range or otherwise incorrect arguments, return
+            <code class="ph codeph">NULL</code> in Impala as opposed to raising exceptions. (This rule applies even when the
+            <code class="ph codeph">ABORT_ON_ERROR=true</code> query option is in effect.) Run small-scale queries using
+            representative data to doublecheck that calls to built-in functions are returning expected values
+            rather than <code class="ph codeph">NULL</code>. For example, unsupported <code class="ph codeph">CAST</code> operations do not
+            raise an error in Impala:
+          </p>
+<pre class="pre codeblock"><code>select cast('foo' as int);
++--------------------+
+| cast('foo' as int) |
++--------------------+
+| NULL               |
++--------------------+</code></pre>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            For any other type not supported in Impala, you could represent their values in string format and write
+            UDFs to process them. See <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a> for details.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            To detect the presence of unsupported or unconvertable data types in data files, do initial testing
+            with the <code class="ph codeph">ABORT_ON_ERROR=true</code> query option in effect. This option causes queries to
+            fail immediately if they encounter disallowed type conversions. See
+            <a class="xref" href="impala_abort_on_error.html#abort_on_error">ABORT_ON_ERROR Query Option</a> for details. For example:
+          </p>
+<pre class="pre codeblock"><code>set abort_on_error=true;
+select count(*) from (select * from t1);
+-- The above query will fail if the data files for T1 contain any
+-- values that can't be converted to the expected Impala data types.
+-- For example, if T1.C1 is defined as INT but the column contains
+-- floating-point values like 1.1, the query will return an error.</code></pre>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="porting__porting_statements">
+
+    <h2 class="title topictitle2" id="ariaid-title4">SQL Statements to Remove or Adapt</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Some SQL statements or clauses that you might be familiar with are not currently supported in Impala:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Impala has no <code class="ph codeph">DELETE</code> statement. Impala is intended for data warehouse-style operations
+            where you do bulk moves and transforms of large quantities of data. Instead of using
+            <code class="ph codeph">DELETE</code>, use <code class="ph codeph">INSERT OVERWRITE</code> to entirely replace the contents of a
+            table or partition, or use <code class="ph codeph">INSERT ... SELECT</code> to copy a subset of data (everything but
+            the rows you intended to delete) from one table to another. See <a class="xref" href="impala_dml.html#dml">DML Statements</a> for
+            an overview of Impala DML statements.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Impala has no <code class="ph codeph">UPDATE</code> statement. Impala is intended for data warehouse-style operations
+            where you do bulk moves and transforms of large quantities of data. Instead of using
+            <code class="ph codeph">UPDATE</code>, do all necessary transformations early in the ETL process, such as in the job
+            that generates the original data, or when copying from one table to another to convert to a particular
+            file format or partitioning scheme. See <a class="xref" href="impala_dml.html#dml">DML Statements</a> for an overview of Impala DML
+            statements.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Impala has no transactional statements, such as <code class="ph codeph">COMMIT</code> or <code class="ph codeph">ROLLBACK</code>.
+            Impala effectively works like the <code class="ph codeph">AUTOCOMMIT</code> mode in some database systems, where
+            changes take effect as soon as they are made.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If your database, table, column, or other names conflict with Impala reserved words, use different
+            names or quote the names with backticks. See <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a>
+            for the current list of Impala reserved words.
+          </p>
+          <p class="p">
+            Conversely, if you use a keyword that Impala does not recognize, it might be interpreted as a table or
+            column alias. For example, in <code class="ph codeph">SELECT * FROM t1 NATURAL JOIN t2</code>, Impala does not
+            recognize the <code class="ph codeph">NATURAL</code> keyword and interprets it as an alias for the table
+            <code class="ph codeph">t1</code>. If you experience any unexpected behavior with queries, check the list of reserved
+            words to make sure all keywords in join and <code class="ph codeph">WHERE</code> clauses are recognized.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Impala supports subqueries only in the <code class="ph codeph">FROM</code> clause of a query, not within the
+            <code class="ph codeph">WHERE</code> clauses. Therefore, you cannot use clauses such as <code class="ph codeph">WHERE
+            <var class="keyword varname">column</var> IN (<var class="keyword varname">subquery</var>)</code>. Also, Impala does not allow
+            <code class="ph codeph">EXISTS</code> or <code class="ph codeph">NOT EXISTS</code> clauses (although <code class="ph codeph">EXISTS</code> is a
+            reserved keyword).
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Impala supports <code class="ph codeph">UNION</code> and <code class="ph codeph">UNION ALL</code> set operators, but not
+            <code class="ph codeph">INTERSECT</code>. <span class="ph">Prefer <code class="ph codeph">UNION ALL</code> over <code class="ph codeph">UNION</code> when you know the
+        data sets are disjoint or duplicate values are not a problem; <code class="ph codeph">UNION ALL</code> is more efficient
+        because it avoids materializing and sorting the entire result set to eliminate duplicate values.</span>
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Within queries, Impala requires query aliases for any subqueries:
+          </p>
+<pre class="pre codeblock"><code>-- Without the alias 'contents_of_t1' at the end, query gives syntax error.
+select count(*) from (select * from t1) contents_of_t1;</code></pre>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            When an alias is declared for an expression in a query, that alias cannot be referenced again within
+            the same query block:
+          </p>
+<pre class="pre codeblock"><code>-- Can't reference AVERAGE twice in the SELECT list where it's defined.
+select avg(x) as average, average+1 from t1 group by x;
+ERROR: AnalysisException: couldn't resolve column reference: 'average'
+
+-- Although it can be referenced again later in the same query.
+select avg(x) as average from t1 group by x having average &gt; 3;</code></pre>
+          <p class="p">
+            For Impala, either repeat the expression again, or abstract the expression into a <code class="ph codeph">WITH</code>
+            clause, creating named columns that can be referenced multiple times anywhere in the base query:
+          </p>
+<pre class="pre codeblock"><code>-- The following 2 query forms are equivalent.
+select avg(x) as average, avg(x)+1 from t1 group by x;
+with avg_t as (select avg(x) average from t1 group by x) select average, average+1 from avg_t;</code></pre>
+
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Impala does not support certain rarely used join types that are less appropriate for high-volume tables
+            used for data warehousing. In some cases, Impala supports join types but requires explicit syntax to
+            ensure you do not do inefficient joins of huge tables by accident. For example, Impala does not support
+            natural joins or anti-joins, and requires the <code class="ph codeph">CROSS JOIN</code> operator for Cartesian
+            products. See <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a> for details on the syntax for Impala join clauses.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Impala has a limited choice of partitioning types. Partitions are defined based on each distinct
+            combination of values for one or more partition key columns. Impala does not redistribute or check data
+            to create evenly distributed partitions; you must choose partition key columns based on your knowledge
+            of the data volume and distribution. Adapt any tables that use range, list, hash, or key partitioning
+            to use the Impala partition syntax for <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code>
+            statements. Impala partitioning is similar to range partitioning where every range has exactly one
+            value, or key partitioning where the hash function produces a separate bucket for every combination of
+            key values. See <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a> for usage details, and
+            <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> and
+            <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> for syntax.
+          </p>
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+            Because the number of separate partitions is potentially higher than in other database systems, keep a
+            close eye on the number of partitions and the volume of data in each one; scale back the number of
+            partition key columns if you end up with too many partitions with a small volume of data in each one.
+            Remember, to distribute work for a query across a cluster, you need at least one HDFS block per node.
+            HDFS blocks are typically multiple megabytes, <span class="ph">especially</span> for Parquet
+            files. Therefore, if each partition holds only a few megabytes of data, you are unlikely to see much
+            parallelism in the query because such a small amount of data is typically processed by a single node.
+          </div>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            For <span class="q">"top-N"</span> queries, Impala uses the <code class="ph codeph">LIMIT</code> clause rather than comparing against a
+            pseudocolumn named <code class="ph codeph">ROWNUM</code> or <code class="ph codeph">ROW_NUM</code>. See
+            <a class="xref" href="impala_limit.html#limit">LIMIT Clause</a> for details.
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="porting__porting_antipatterns">
+
+    <h2 class="title topictitle2" id="ariaid-title5">SQL Constructs to Doublecheck</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Some SQL constructs that are supported have behavior or defaults more oriented towards convenience than
+        optimal performance. Also, sometimes machine-generated SQL, perhaps issued through JDBC or ODBC
+        applications, might have inefficiencies or exceed internal Impala limits. As you port SQL code, be alert
+        and change these things where appropriate:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            A <code class="ph codeph">CREATE TABLE</code> statement with no <code class="ph codeph">STORED AS</code> clause creates data files
+            in plain text format, which is convenient for data interchange but not a good choice for high-volume
+            data with high-performance queries. See <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for why and
+            how to use specific file formats for compact data and high-performance queries. Especially see
+            <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>, for details about the file format most heavily optimized for
+            large-scale data warehouse queries.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            A <code class="ph codeph">CREATE TABLE</code> statement with no <code class="ph codeph">PARTITIONED BY</code> clause stores all the
+            data files in the same physical location, which can lead to scalability problems when the data volume
+            becomes large.
+          </p>
+          <p class="p">
+            On the other hand, adapting tables that were already partitioned in a different database system could
+            produce an Impala table with a high number of partitions and not enough data in each one, leading to
+            underutilization of Impala's parallel query features.
+          </p>
+          <p class="p">
+            See <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a> for details about setting up partitioning and
+            tuning the performance of queries on partitioned tables.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">INSERT ... VALUES</code> syntax is suitable for setting up toy tables with a few rows for
+            functional testing, but because each such statement creates a separate tiny file in HDFS, it is not a
+            scalable technique for loading megabytes or gigabytes (let alone petabytes) of data. Consider revising
+            your data load process to produce raw data files outside of Impala, then setting up Impala external
+            tables or using the <code class="ph codeph">LOAD DATA</code> statement to use those data files instantly in Impala
+            tables, with no conversion or indexing stage. See <a class="xref" href="impala_tables.html#external_tables">External Tables</a> and
+            <a class="xref" href="impala_load_data.html#load_data">LOAD DATA Statement</a> for details about the Impala techniques for working with
+            data files produced outside of Impala; see <a class="xref" href="impala_tutorial.html#tutorial_etl">Data Loading and Querying Examples</a> for examples
+            of ETL workflow for Impala.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If your ETL process is not optimized for Hadoop, you might end up with highly fragmented small data
+            files, or a single giant data file that cannot take advantage of distributed parallel queries or
+            partitioning. In this case, use an <code class="ph codeph">INSERT ... SELECT</code> statement to copy the data into a
+            new table and reorganize into a more efficient layout in the same operation. See
+            <a class="xref" href="impala_insert.html#insert">INSERT Statement</a> for details about the <code class="ph codeph">INSERT</code> statement.
+          </p>
+          <p class="p">
+            You can do <code class="ph codeph">INSERT ... SELECT</code> into a table with a more efficient file format (see
+            <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a>) or from an unpartitioned table into a partitioned
+            one (see <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>).
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The number of expressions allowed in an Impala query might be smaller than for some other database
+            systems, causing failures for very complicated queries (typically produced by automated SQL
+            generators). Where practical, keep the number of expressions in the <code class="ph codeph">WHERE</code> clauses to
+            approximately 2000 or fewer. As a workaround, set the query option
+            <code class="ph codeph">DISABLE_CODEGEN=true</code> if queries fail for this reason. See
+            <a class="xref" href="impala_disable_codegen.html#disable_codegen">DISABLE_CODEGEN Query Option</a> for details.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If practical, rewrite <code class="ph codeph">UNION</code> queries to use the <code class="ph codeph">UNION ALL</code> operator
+            instead. <span class="ph">Prefer <code class="ph codeph">UNION ALL</code> over <code class="ph codeph">UNION</code> when you know the
+        data sets are disjoint or duplicate values are not a problem; <code class="ph codeph">UNION ALL</code> is more efficient
+        because it avoids materializing and sorting the entire result set to eliminate duplicate values.</span>
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="porting__porting_next">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Next Porting Steps after Verifying Syntax and Semantics</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Throughout this section, some of the decisions you make during the porting process also have a substantial
+        impact on performance. After your SQL code is ported and working correctly, doublecheck the
+        performance-related aspects of your schema design, physical layout, and queries to make sure that the
+        ported application is taking full advantage of Impala's parallelism, performance-related SQL features, and
+        integration with Hadoop components.
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Have you run the <code class="ph codeph">COMPUTE STATS</code> statement on each table involved in join queries? Have
+          you also run <code class="ph codeph">COMPUTE STATS</code> for each table used as the source table in an <code class="ph codeph">INSERT
+          ... SELECT</code> or <code class="ph codeph">CREATE TABLE AS SELECT</code> statement?
+        </li>
+
+        <li class="li">
+          Are you using the most efficient file format for your data volumes, table structure, and query
+          characteristics?
+        </li>
+
+        <li class="li">
+          Are you using partitioning effectively? That is, have you partitioned on columns that are often used for
+          filtering in <code class="ph codeph">WHERE</code> clauses? Have you partitioned at the right granularity so that there
+          is enough data in each partition to parallelize the work for each query?
+        </li>
+
+        <li class="li">
+          Does your ETL process produce a relatively small number of multi-megabyte data files (good) rather than a
+          huge number of small files (bad)?
+        </li>
+      </ul>
+
+      <p class="p">
+        See <a class="xref" href="impala_performance.html#performance">Tuning Impala for Performance</a> for details about the whole performance tuning
+        process.
+      </p>
+    </div>
+  </article>
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_ports.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_ports.html b/docs/build3x/html/topics/impala_ports.html
new file mode 100644
index 0000000..5acc1b6
--- /dev/null
+++ b/docs/build3x/html/topics/impala_ports.html
@@ -0,0 +1,421 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="ports"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Ports Used by Impala</title></head><body id="ports"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Ports Used by Impala</h1>
+
+
+  <div class="body conbody" id="ports__conbody_ports">
+
+    <p class="p">
+
+      Impala uses the TCP ports listed in the following table. Before deploying Impala, ensure these ports are open
+      on each system.
+    </p>
+
+    <table class="table"><caption></caption><colgroup><col style="width:18.181818181818183%"><col style="width:27.27272727272727%"><col style="width:9.090909090909092%"><col style="width:18.181818181818183%"><col style="width:27.27272727272727%"></colgroup><thead class="thead">
+          <tr class="row">
+            <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__1">
+              Component
+            </th>
+            <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__2">
+              Service
+            </th>
+            <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__3">
+              Port
+            </th>
+            <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__4">
+              Access Requirement
+            </th>
+            <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__5">
+              Comment
+            </th>
+          </tr>
+        </thead><tbody class="tbody">
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Impala Daemon Frontend Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                21000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                External
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Used to transmit commands and receive results by <code class="ph codeph">impala-shell</code> and
+                some ODBC drivers.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Impala Daemon Frontend Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                21050
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                External
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Used to transmit commands and receive results by applications, such as Business Intelligence tools,
+                using JDBC, the Beeswax query editor in Hue, and some ODBC drivers.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Impala Daemon Backend Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                22000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. Impala daemons use this port to communicate with each other.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                StateStoreSubscriber Service Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                23000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. Impala daemons listen on this port for updates from the statestore daemon.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Catalog Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                StateStoreSubscriber Service Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                23020
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. The catalog daemon listens on this port for updates from the statestore daemon.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Impala Daemon HTTP Server Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                25000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                External
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Impala web interface for administrators to monitor and troubleshoot.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala StateStore Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                StateStore HTTP Server Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                25010
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                External
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                StateStore web interface for administrators to monitor and troubleshoot.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Catalog Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Catalog HTTP Server Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                25020
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                External
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Catalog service web interface for administrators to monitor and troubleshoot. New in Impala 1.2 and
+                higher.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala StateStore Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                StateStore Service Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                24000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. The statestore daemon listens on this port for registration/unregistration
+                requests.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Catalog Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                StateStore Service Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                26000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. The catalog service uses this port to communicate with the Impala daemons. New
+                in Impala 1.2 and higher.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Llama Callback Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                28000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. Impala daemons use to communicate with Llama. New in <span class="keyword">Impala 1.3</span> and higher.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Llama ApplicationMaster
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Llama Thrift Admin Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                15002
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. New in <span class="keyword">Impala 1.3</span> and higher.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Llama ApplicationMaster
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Llama Thrift Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                15000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. New in <span class="keyword">Impala 1.3</span> and higher.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Llama ApplicationMaster
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Llama HTTP Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                15001
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                External
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Llama service web interface for administrators to monitor and troubleshoot.
+                New in <span class="keyword">Impala 1.3</span> and higher.
+              </p>
+            </td>
+          </tr>
+        </tbody></table>
+  </div>
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_prefetch_mode.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_prefetch_mode.html b/docs/build3x/html/topics/impala_prefetch_mode.html
new file mode 100644
index 0000000..b7cc1f5
--- /dev/null
+++ b/docs/build3x/html/topics/impala_prefetch_mode.html
@@ -0,0 +1,47 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="prefetch_mode"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>PREFETCH_MODE Query Option (Impala 2.6 or higher only)</title></head><body id="prefetch_mode"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">PREFETCH_MODE Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Determines whether the prefetching optimization is applied during
+      join query processing.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric (0, 1)
+      or corresponding mnemonic strings (<code class="ph codeph">NONE</code>, <code class="ph codeph">HT_BUCKET</code>).
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 1 (equivalent to <code class="ph codeph">HT_BUCKET</code>)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+    <p class="p">
+      The default mode is 1, which means that hash table buckets are
+      prefetched during join query processing.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a>,
+      <a class="xref" href="impala_perf_joins.html#perf_joins">Performance Considerations for Join Queries</a>.
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_prereqs.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_prereqs.html b/docs/build3x/html/topics/impala_prereqs.html
new file mode 100644
index 0000000..88293d4
--- /dev/null
+++ b/docs/build3x/html/topics/impala_prereqs.html
@@ -0,0 +1,275 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_planning.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="prereqs"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Requirements</title></head><body id="prereqs"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Requirements</h1>
+
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+
+      To perform as expected, Impala depends on the availability of the software, hardware, and configurations
+      described in the following sections.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_planning.html">Planning for Impala Deployment</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="prereqs__prereqs_os">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Supported Operating Systems</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+
+
+
+
+
+
+
+        Apache Impala runs on Linux systems only. See the <span class="ph filepath">README.md</span>
+        file for more information.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="prereqs__prereqs_hive">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Hive Metastore and Related Configuration</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+
+
+
+        Impala can interoperate with data stored in Hive, and uses the same infrastructure as Hive for tracking
+        metadata about schema objects such as tables and columns. The following components are prerequisites for
+        Impala:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          MySQL or PostgreSQL, to act as a metastore database for both Impala and Hive.
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+            <p class="p">
+              Installing and configuring a Hive metastore is an Impala requirement. Impala does not work without
+              the metastore database. For the process of installing and configuring the metastore, see
+              <a class="xref" href="impala_install.html#install">Installing Impala</a>.
+            </p>
+
+            <p class="p">
+              Always configure a <strong class="ph b">Hive metastore service</strong> rather than connecting directly to the metastore
+              database. The Hive metastore service is required to interoperate between different levels of
+              metastore APIs if this is necessary for your environment, and using it avoids known issues with
+              connecting directly to the metastore database.
+            </p>
+
+            <p class="p">
+              A summary of the metastore installation process is as follows:
+            </p>
+            <ul class="ul">
+              <li class="li">
+                Install a MySQL or PostgreSQL database. Start the database if it is not started after installation.
+              </li>
+
+              <li class="li">
+                Download the
+                <a class="xref" href="http://www.mysql.com/products/connector/" target="_blank">MySQL
+                connector</a> or the
+                <a class="xref" href="http://jdbc.postgresql.org/download.html" target="_blank">PostgreSQL
+                connector</a> and place it in the <code class="ph codeph">/usr/share/java/</code> directory.
+              </li>
+
+              <li class="li">
+                Use the appropriate command line tool for your database to create the metastore database.
+              </li>
+
+              <li class="li">
+                Use the appropriate command line tool for your database to grant privileges for the metastore
+                database to the <code class="ph codeph">hive</code> user.
+              </li>
+
+              <li class="li">
+                Modify <code class="ph codeph">hive-site.xml</code> to include information matching your particular database: its
+                URL, username, and password. You will copy the <code class="ph codeph">hive-site.xml</code> file to the Impala
+                Configuration Directory later in the Impala installation process.
+              </li>
+            </ul>
+          </div>
+        </li>
+
+        <li class="li">
+          <strong class="ph b">Optional:</strong> Hive. Although only the Hive metastore database is required for Impala to function, you
+          might install Hive on some client machines to create and load data into tables that use certain file
+          formats. See <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details. Hive does not need to be
+          installed on the same DataNodes as Impala; it just needs access to the same metastore database.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="prereqs__prereqs_java">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Java Dependencies</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+
+
+        Although Impala is primarily written in C++, it does use Java to communicate with various Hadoop
+        components:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The officially supported JVM for Impala is the Oracle JVM. Other JVMs might cause issues, typically
+          resulting in a failure at <span class="keyword cmdname">impalad</span> startup. In particular, the JamVM used by default on
+          certain levels of Ubuntu systems can cause <span class="keyword cmdname">impalad</span> to fail to start.
+        </li>
+
+        <li class="li">
+          Internally, the <span class="keyword cmdname">impalad</span> daemon relies on the <code class="ph codeph">JAVA_HOME</code> environment
+          variable to locate the system Java libraries. Make sure the <span class="keyword cmdname">impalad</span> service is not run
+          from an environment with an incorrect setting for this variable.
+        </li>
+
+        <li class="li">
+          All Java dependencies are packaged in the <code class="ph codeph">impala-dependencies.jar</code> file, which is located
+          at <code class="ph codeph">/usr/lib/impala/lib/</code>. These map to everything that is built under
+          <code class="ph codeph">fe/target/dependency</code>.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="prereqs__prereqs_network">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Networking Configuration Requirements</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        As part of ensuring best performance, Impala attempts to complete tasks on local data, as opposed to using
+        network connections to work with remote data. To support this goal, Impala matches
+        the&nbsp;<strong class="ph b">hostname</strong>&nbsp;provided to each Impala daemon with the&nbsp;<strong class="ph b">IP address</strong>&nbsp;of each DataNode by
+        resolving the hostname flag to an IP address. For Impala to work with local data, use a single IP interface
+        for the DataNode and the Impala daemon on each machine. Ensure that the Impala daemon's hostname flag
+        resolves to the IP address of the DataNode. For single-homed machines, this is usually automatic, but for
+        multi-homed machines, ensure that the Impala daemon's hostname resolves to the correct interface. Impala
+        tries to detect the correct hostname at start-up, and prints the derived hostname at the start of the log
+        in a message of the form:
+      </p>
+
+<pre class="pre codeblock"><code>Using hostname: impala-daemon-1.example.com</code></pre>
+
+      <p class="p">
+        In the majority of cases, this automatic detection works correctly. If you need to explicitly set the
+        hostname, do so by setting the&nbsp;<code class="ph codeph">--hostname</code>&nbsp;flag.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="prereqs__prereqs_hardware">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Hardware Requirements</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+
+
+
+
+
+
+
+        During join operations, portions of data from each joined table are loaded into memory. Data sets can be
+        very large, so ensure your hardware has sufficient memory to accommodate the joins you anticipate
+        completing.
+      </p>
+
+      <p class="p">
+        While requirements vary according to data set size, the following is generally recommended:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          CPU - Impala version 2.2 and higher uses the SSSE3 instruction set, which is included in newer processors.
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+            This required level of processor is the same as in Impala version 1.x. The Impala 2.0 and 2.1 releases
+            had a stricter requirement for the SSE4.1 instruction set, which has now been relaxed.
+          </div>
+
+        </li>
+
+        <li class="li">
+          Memory - 128 GB or more recommended, ideally 256 GB or more. If the intermediate results during query
+          processing on a particular node exceed the amount of memory available to Impala on that node, the query
+          writes temporary work data to disk, which can lead to long query times. Note that because the work is
+          parallelized, and intermediate results for aggregate queries are typically smaller than the original
+          data, Impala can query and join tables that are much larger than the memory available on an individual
+          node.
+        </li>
+
+        <li class="li">
+          Storage - DataNodes with 12 or more disks each. I/O speeds are often the limiting factor for disk
+          performance with Impala. Ensure that you have sufficient disk space to store the data Impala will be
+          querying.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="prereqs__prereqs_account">
+
+    <h2 class="title topictitle2" id="ariaid-title7">User Account Requirements</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+
+
+
+        Impala creates and uses a user and group named <code class="ph codeph">impala</code>. Do not delete this account or group
+        and do not modify the account's or group's permissions and rights. Ensure no existing systems obstruct the
+        functioning of these accounts and groups. For example, if you have scripts that delete user accounts not in
+        a white-list, add these accounts to the list of permitted accounts.
+      </p>
+
+      <p class="p">
+        For correct file deletion during <code class="ph codeph">DROP TABLE</code> operations, Impala must be able to move files
+        to the HDFS trashcan. You might need to create an HDFS directory <span class="ph filepath">/user/impala</span>,
+        writeable by the <code class="ph codeph">impala</code> user, so that the trashcan can be created. Otherwise, data files
+        might remain behind after a <code class="ph codeph">DROP TABLE</code> statement.
+      </p>
+
+      <p class="p">
+        Impala should not run as root. Best Impala performance is achieved using direct reads, but root is not
+        permitted to use direct reads. Therefore, running Impala as root negatively affects performance.
+      </p>
+
+      <p class="p">
+        By default, any user can connect to Impala and access all the associated databases and tables. You can
+        enable authorization and authentication based on the Linux OS user who connects to the Impala server, and
+        the associated groups for that user. <a class="xref" href="impala_security.html#security">Impala Security</a> for details. These
+        security features do not change the underlying file permission requirements; the <code class="ph codeph">impala</code>
+        user still needs to be able to access the data files.
+      </p>
+    </div>
+  </article>
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_processes.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_processes.html b/docs/build3x/html/topics/impala_processes.html
new file mode 100644
index 0000000..4d64072
--- /dev/null
+++ b/docs/build3x/html/topics/impala_processes.html
@@ -0,0 +1,115 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_config_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="processes"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Starting Impala</title></head><body id="processes"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Starting Impala</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+
+
+      To activate Impala if it is installed but not yet started:
+    </p>
+
+    <ol class="ol">
+      <li class="li">
+        Set any necessary configuration options for the Impala services. See
+        <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for details.
+      </li>
+
+      <li class="li">
+        Start one instance of the Impala statestore. The statestore helps Impala to distribute work efficiently,
+        and to continue running in the event of availability problems for other Impala nodes. If the statestore
+        becomes unavailable, Impala continues to function.
+      </li>
+
+      <li class="li">
+        Start one instance of the Impala catalog service.
+      </li>
+
+      <li class="li">
+        Start the main Impala service on one or more DataNodes, ideally on all DataNodes to maximize local
+        processing and avoid network traffic due to remote reads.
+      </li>
+    </ol>
+
+    <p class="p">
+      Once Impala is running, you can conduct interactive experiments using the instructions in
+      <a class="xref" href="impala_tutorial.html#tutorial">Impala Tutorials</a> and try <a class="xref" href="impala_impala_shell.html#impala_shell">Using the Impala Shell (impala-shell Command)</a>.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_config_options.html">Modifying Impala Startup Options</a></strong><br></li></ul></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="processes__starting_via_cmdline">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Starting Impala from the Command Line</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        To start the Impala state store and Impala from the command line or a script, you can either use the
+        <span class="keyword cmdname">service</span> command or you can start the daemons directly through the
+        <span class="keyword cmdname">impalad</span>, <code class="ph codeph">statestored</code>, and <span class="keyword cmdname">catalogd</span> executables.
+      </p>
+
+      <p class="p">
+        Start the Impala statestore and then start <code class="ph codeph">impalad</code> instances. You can modify the values
+        the service initialization scripts use when starting the statestore and Impala by editing
+        <code class="ph codeph">/etc/default/impala</code>.
+      </p>
+
+      <p class="p">
+        Start the statestore service using a command similar to the following:
+      </p>
+
+      <div class="p">
+<pre class="pre codeblock"><code>$ sudo service impala-state-store start</code></pre>
+      </div>
+
+      <p class="p">
+        Start the catalog service using a command similar to the following:
+      </p>
+
+<pre class="pre codeblock"><code>$ sudo service impala-catalog start</code></pre>
+
+      <p class="p">
+        Start the Impala service on each DataNode using a command similar to the following:
+      </p>
+
+      <div class="p">
+<pre class="pre codeblock"><code>$ sudo service impala-server start</code></pre>
+      </div>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+        In <span class="keyword">Impala 2.5</span> and higher, Impala UDFs and UDAs written in C++ are persisted in the metastore database.
+        Java UDFs are also persisted, if they were created with the new <code class="ph codeph">CREATE FUNCTION</code> syntax for Java UDFs,
+        where the Java function argument and return types are omitted.
+        Java-based UDFs created with the old <code class="ph codeph">CREATE FUNCTION</code> syntax do not persist across restarts
+        because they are held in the memory of the <span class="keyword cmdname">catalogd</span> daemon.
+        Until you re-create such Java UDFs using the new <code class="ph codeph">CREATE FUNCTION</code> syntax,
+        you must reload those Java-based UDFs by running the original <code class="ph codeph">CREATE FUNCTION</code> statements again each time
+        you restart the <span class="keyword cmdname">catalogd</span> daemon.
+        Prior to <span class="keyword">Impala 2.5</span> the requirement to reload functions after a restart applied to both C++ and Java functions.
+      </p>
+      </div>
+
+      <div class="p">
+        If any of the services fail to start, review:
+        <ul class="ul">
+          <li class="li">
+            <a class="xref" href="impala_logging.html#logs_debug">Reviewing Impala Logs</a>
+          </li>
+
+          <li class="li">
+            <a class="xref" href="impala_troubleshooting.html#troubleshooting">Troubleshooting Impala</a>
+          </li>
+        </ul>
+      </div>
+    </div>
+  </article>
+</article></main></body></html>