You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by mi...@apache.org on 2018/05/09 21:10:21 UTC
[12/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_planning.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_planning.html b/docs/build3x/html/topics/impala_planning.html
new file mode 100644
index 0000000..e571e42
--- /dev/null
+++ b/docs/build3x/html/topics/impala_planning.html
@@ -0,0 +1,20 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_prereqs.html#prereqs"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_design.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="planning"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Planning for Impala Deployment</title></head><body id="planning"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Planning for Impala Deployment</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Before you set up Impala in production, do some planning to make sure that your hardware setup has sufficient
+ capacity, that your cluster topology is optimal for Impala queries, and that your schema design and ETL
+ processes follow the best practices for Impala.
+ </p>
+
+ <p class="p toc"></p>
+ </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_prereqs.html#prereqs">Impala Requirements</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_schema_design.html">Guidelines for Designing Impala Schemas</a></strong><br></li></ul></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_porting.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_porting.html b/docs/build3x/html/topics/impala_porting.html
new file mode 100644
index 0000000..8a8ba7e
--- /dev/null
+++ b/docs/build3x/html/topics/impala_porting.html
@@ -0,0 +1,603 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="porting"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Porting SQL from Other Database Systems to Impala</title></head><body id="porting"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Porting SQL from Other Database Systems to Impala</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Although Impala uses standard SQL for queries, you might need to modify SQL source when bringing applications
+ to Impala, due to variations in data types, built-in functions, vendor language extensions, and
+ Hadoop-specific syntax. Even when SQL is working correctly, you might make further minor modifications for
+ best performance.
+ </p>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="porting__porting_ddl_dml">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Porting DDL and DML Statements</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ When adapting SQL code from a traditional database system to Impala, expect to find a number of differences
+ in the DDL statements that you use to set up the schema. Clauses related to physical layout of files,
+ tablespaces, and indexes have no equivalent in Impala. You might restructure your schema considerably to
+ account for the Impala partitioning scheme and Hadoop file formats.
+ </p>
+
+ <p class="p">
+ Expect SQL queries to have a much higher degree of compatibility. With modest rewriting to address vendor
+ extensions and features not yet supported in Impala, you might be able to run identical or almost-identical
+ query text on both systems.
+ </p>
+
+ <p class="p">
+ Therefore, consider separating out the DDL into a separate Impala-specific setup script. Focus your reuse
+ and ongoing tuning efforts on the code for SQL queries.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="porting__porting_data_types">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Porting Data Types from Other Database Systems</h2>
+
+ <div class="body conbody">
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Change any <code class="ph codeph">VARCHAR</code>, <code class="ph codeph">VARCHAR2</code>, and <code class="ph codeph">CHAR</code> columns to
+ <code class="ph codeph">STRING</code>. Remove any length constraints from the column declarations; for example,
+ change <code class="ph codeph">VARCHAR(32)</code> or <code class="ph codeph">CHAR(1)</code> to <code class="ph codeph">STRING</code>. Impala is
+ very flexible about the length of string values; it does not impose any length constraints
+ or do any special processing (such as blank-padding) for <code class="ph codeph">STRING</code> columns.
+ (In Impala 2.0 and higher, there are data types <code class="ph codeph">VARCHAR</code> and <code class="ph codeph">CHAR</code>,
+ with length constraints for both types and blank-padding for <code class="ph codeph">CHAR</code>.
+ However, for performance reasons, it is still preferable to use <code class="ph codeph">STRING</code>
+ columns where practical.)
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ For national language character types such as <code class="ph codeph">NCHAR</code>, <code class="ph codeph">NVARCHAR</code>, or
+ <code class="ph codeph">NCLOB</code>, be aware that while Impala can store and query UTF-8 character data, currently
+ some string manipulation operations only work correctly with ASCII data. See
+ <a class="xref" href="impala_string.html#string">STRING Data Type</a> for details.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Change any <code class="ph codeph">DATE</code>, <code class="ph codeph">DATETIME</code>, or <code class="ph codeph">TIME</code> columns to
+ <code class="ph codeph">TIMESTAMP</code>. Remove any precision constraints. Remove any timezone clauses, and make
+ sure your application logic or ETL process accounts for the fact that Impala expects all
+ <code class="ph codeph">TIMESTAMP</code> values to be in
+ <a class="xref" href="http://en.wikipedia.org/wiki/Coordinated_Universal_Time" target="_blank">Coordinated
+ Universal Time (UTC)</a>. See <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for information about
+ the <code class="ph codeph">TIMESTAMP</code> data type, and
+ <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for conversion functions for different
+ date and time formats.
+ </p>
+ <p class="p">
+ You might also need to adapt date- and time-related literal values and format strings to use the
+ supported Impala date and time formats. If you have date and time literals with different separators or
+ different numbers of <code class="ph codeph">YY</code>, <code class="ph codeph">MM</code>, and so on placeholders than Impala
+ expects, consider using calls to <code class="ph codeph">regexp_replace()</code> to transform those values to the
+ Impala-compatible format. See <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for information about the
+ allowed formats for date and time literals, and
+ <a class="xref" href="impala_string_functions.html#string_functions">Impala String Functions</a> for string conversion functions such as
+ <code class="ph codeph">regexp_replace()</code>.
+ </p>
+ <p class="p">
+ Instead of <code class="ph codeph">SYSDATE</code>, call the function <code class="ph codeph">NOW()</code>.
+ </p>
+ <p class="p">
+ Instead of adding or subtracting directly from a date value to produce a value <var class="keyword varname">N</var>
+ days in the past or future, use an <code class="ph codeph">INTERVAL</code> expression, for example <code class="ph codeph">NOW() +
+ INTERVAL 30 DAYS</code>.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Although Impala supports <code class="ph codeph">INTERVAL</code> expressions for datetime arithmetic, as shown in
+ <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a>, <code class="ph codeph">INTERVAL</code> is not available as a column
+ data type in Impala. For any <code class="ph codeph">INTERVAL</code> values stored in tables, convert them to numeric
+ values that you can add or subtract using the functions in
+ <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a>. For example, if you had a table
+ <code class="ph codeph">DEADLINES</code> with an <code class="ph codeph">INT</code> column <code class="ph codeph">TIME_PERIOD</code>, you could
+ construct dates N days in the future like so:
+ </p>
+<pre class="pre codeblock"><code>SELECT NOW() + INTERVAL time_period DAYS from deadlines;</code></pre>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ For <code class="ph codeph">YEAR</code> columns, change to the smallest Impala integer type that has sufficient
+ range. See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details about ranges, casting, and so on
+ for the various numeric data types.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Change any <code class="ph codeph">DECIMAL</code> and <code class="ph codeph">NUMBER</code> types. If fixed-point precision is not
+ required, you can use <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code> on the Impala side depending on
+ the range of values. For applications that require precise decimal values, such as financial data, you
+ might need to make more extensive changes to table structure and application logic, such as using
+ separate integer columns for dollars and cents, or encoding numbers as string values and writing UDFs
+ to manipulate them. See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details about ranges,
+ casting, and so on for the various numeric data types.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">FLOAT</code>, <code class="ph codeph">DOUBLE</code>, and <code class="ph codeph">REAL</code> types are supported in
+ Impala. Remove any precision and scale specifications. (In Impala, <code class="ph codeph">REAL</code> is just an
+ alias for <code class="ph codeph">DOUBLE</code>; columns declared as <code class="ph codeph">REAL</code> are turned into
+ <code class="ph codeph">DOUBLE</code> behind the scenes.) See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for
+ details about ranges, casting, and so on for the various numeric data types.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Most integer types from other systems have equivalents in Impala, perhaps under different names such as
+ <code class="ph codeph">BIGINT</code> instead of <code class="ph codeph">INT8</code>. For any that are unavailable, for example
+ <code class="ph codeph">MEDIUMINT</code>, switch to the smallest Impala integer type that has sufficient range.
+ Remove any precision specifications. See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details
+ about ranges, casting, and so on for the various numeric data types.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Remove any <code class="ph codeph">UNSIGNED</code> constraints. All Impala numeric types are signed. See
+ <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details about ranges, casting, and so on for the
+ various numeric data types.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ For any types holding bitwise values, use an integer type with enough range to hold all the relevant
+ bits within a positive integer. See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details about
+ ranges, casting, and so on for the various numeric data types.
+ </p>
+ <p class="p">
+ For example, <code class="ph codeph">TINYINT</code> has a maximum positive value of 127, not 256, so to manipulate
+ 8-bit bitfields as positive numbers switch to the next largest type <code class="ph codeph">SMALLINT</code>.
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > select cast(127*2 as tinyint);
++--------------------------+
+| cast(127 * 2 as tinyint) |
++--------------------------+
+| -2 |
++--------------------------+
+[localhost:21000] > select cast(128 as tinyint);
++----------------------+
+| cast(128 as tinyint) |
++----------------------+
+| -128 |
++----------------------+
+[localhost:21000] > select cast(127*2 as smallint);
++---------------------------+
+| cast(127 * 2 as smallint) |
++---------------------------+
+| 254 |
++---------------------------+</code></pre>
+ <p class="p">
+ Impala does not support notation such as <code class="ph codeph">b'0101'</code> for bit literals.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ For BLOB values, use <code class="ph codeph">STRING</code> to represent <code class="ph codeph">CLOB</code> or
+ <code class="ph codeph">TEXT</code> types (character based large objects) up to 32 KB in size. Binary large objects
+ such as <code class="ph codeph">BLOB</code>, <code class="ph codeph">RAW</code> <code class="ph codeph">BINARY</code>, and
+ <code class="ph codeph">VARBINARY</code> do not currently have an equivalent in Impala.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ For Boolean-like types such as <code class="ph codeph">BOOL</code>, use the Impala <code class="ph codeph">BOOLEAN</code> type.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Because Impala currently does not support composite or nested types, any spatial data types in other
+ database systems do not have direct equivalents in Impala. You could represent spatial values in string
+ format and write UDFs to process them. See <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a> for details. Where
+ practical, separate spatial types into separate tables so that Impala can still work with the
+ non-spatial data.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Take out any <code class="ph codeph">DEFAULT</code> clauses. Impala can use data files produced from many different
+ sources, such as Pig, Hive, or MapReduce jobs. The fast import mechanisms of <code class="ph codeph">LOAD DATA</code>
+ and external tables mean that Impala is flexible about the format of data files, and Impala does not
+ necessarily validate or cleanse data before querying it. When copying data through Impala
+ <code class="ph codeph">INSERT</code> statements, you can use conditional functions such as <code class="ph codeph">CASE</code> or
+ <code class="ph codeph">NVL</code> to substitute some other value for <code class="ph codeph">NULL</code> fields; see
+ <a class="xref" href="impala_conditional_functions.html#conditional_functions">Impala Conditional Functions</a> for details.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Take out any constraints from your <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code>
+ statements, for example <code class="ph codeph">PRIMARY KEY</code>, <code class="ph codeph">FOREIGN KEY</code>,
+ <code class="ph codeph">UNIQUE</code>, <code class="ph codeph">NOT NULL</code>, <code class="ph codeph">UNSIGNED</code>, or
+ <code class="ph codeph">CHECK</code> constraints. Impala can use data files produced from many different sources,
+ such as Pig, Hive, or MapReduce jobs. Therefore, Impala expects initial data validation to happen
+ earlier during the ETL or ELT cycle. After data is loaded into Impala tables, you can perform queries
+ to test for <code class="ph codeph">NULL</code> values. When copying data through Impala <code class="ph codeph">INSERT</code>
+ statements, you can use conditional functions such as <code class="ph codeph">CASE</code> or <code class="ph codeph">NVL</code> to
+ substitute some other value for <code class="ph codeph">NULL</code> fields; see
+ <a class="xref" href="impala_conditional_functions.html#conditional_functions">Impala Conditional Functions</a> for details.
+ </p>
+ <p class="p">
+ Do as much verification as practical before loading data into Impala. After data is loaded into Impala,
+ you can do further verification using SQL queries to check if values have expected ranges, if values
+ are <code class="ph codeph">NULL</code> or not, and so on. If there is a problem with the data, you will need to
+ re-run earlier stages of the ETL process, or do an <code class="ph codeph">INSERT ... SELECT</code> statement in
+ Impala to copy the faulty data to a new table and transform or filter out the bad values.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Take out any <code class="ph codeph">CREATE INDEX</code>, <code class="ph codeph">DROP INDEX</code>, and <code class="ph codeph">ALTER
+ INDEX</code> statements, and equivalent <code class="ph codeph">ALTER TABLE</code> statements. Remove any
+ <code class="ph codeph">INDEX</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">PRIMARY KEY</code> clauses from
+ <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements. Impala is optimized for bulk
+ read operations for data warehouse-style queries, and therefore does not support indexes for its
+ tables.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Calls to built-in functions with out-of-range or otherwise incorrect arguments, return
+ <code class="ph codeph">NULL</code> in Impala as opposed to raising exceptions. (This rule applies even when the
+ <code class="ph codeph">ABORT_ON_ERROR=true</code> query option is in effect.) Run small-scale queries using
+ representative data to doublecheck that calls to built-in functions are returning expected values
+ rather than <code class="ph codeph">NULL</code>. For example, unsupported <code class="ph codeph">CAST</code> operations do not
+ raise an error in Impala:
+ </p>
+<pre class="pre codeblock"><code>select cast('foo' as int);
++--------------------+
+| cast('foo' as int) |
++--------------------+
+| NULL |
++--------------------+</code></pre>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ For any other type not supported in Impala, you could represent their values in string format and write
+ UDFs to process them. See <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a> for details.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ To detect the presence of unsupported or unconvertable data types in data files, do initial testing
+ with the <code class="ph codeph">ABORT_ON_ERROR=true</code> query option in effect. This option causes queries to
+ fail immediately if they encounter disallowed type conversions. See
+ <a class="xref" href="impala_abort_on_error.html#abort_on_error">ABORT_ON_ERROR Query Option</a> for details. For example:
+ </p>
+<pre class="pre codeblock"><code>set abort_on_error=true;
+select count(*) from (select * from t1);
+-- The above query will fail if the data files for T1 contain any
+-- values that can't be converted to the expected Impala data types.
+-- For example, if T1.C1 is defined as INT but the column contains
+-- floating-point values like 1.1, the query will return an error.</code></pre>
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="porting__porting_statements">
+
+ <h2 class="title topictitle2" id="ariaid-title4">SQL Statements to Remove or Adapt</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Some SQL statements or clauses that you might be familiar with are not currently supported in Impala:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Impala has no <code class="ph codeph">DELETE</code> statement. Impala is intended for data warehouse-style operations
+ where you do bulk moves and transforms of large quantities of data. Instead of using
+ <code class="ph codeph">DELETE</code>, use <code class="ph codeph">INSERT OVERWRITE</code> to entirely replace the contents of a
+ table or partition, or use <code class="ph codeph">INSERT ... SELECT</code> to copy a subset of data (everything but
+ the rows you intended to delete) from one table to another. See <a class="xref" href="impala_dml.html#dml">DML Statements</a> for
+ an overview of Impala DML statements.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Impala has no <code class="ph codeph">UPDATE</code> statement. Impala is intended for data warehouse-style operations
+ where you do bulk moves and transforms of large quantities of data. Instead of using
+ <code class="ph codeph">UPDATE</code>, do all necessary transformations early in the ETL process, such as in the job
+ that generates the original data, or when copying from one table to another to convert to a particular
+ file format or partitioning scheme. See <a class="xref" href="impala_dml.html#dml">DML Statements</a> for an overview of Impala DML
+ statements.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Impala has no transactional statements, such as <code class="ph codeph">COMMIT</code> or <code class="ph codeph">ROLLBACK</code>.
+ Impala effectively works like the <code class="ph codeph">AUTOCOMMIT</code> mode in some database systems, where
+ changes take effect as soon as they are made.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If your database, table, column, or other names conflict with Impala reserved words, use different
+ names or quote the names with backticks. See <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a>
+ for the current list of Impala reserved words.
+ </p>
+ <p class="p">
+ Conversely, if you use a keyword that Impala does not recognize, it might be interpreted as a table or
+ column alias. For example, in <code class="ph codeph">SELECT * FROM t1 NATURAL JOIN t2</code>, Impala does not
+ recognize the <code class="ph codeph">NATURAL</code> keyword and interprets it as an alias for the table
+ <code class="ph codeph">t1</code>. If you experience any unexpected behavior with queries, check the list of reserved
+ words to make sure all keywords in join and <code class="ph codeph">WHERE</code> clauses are recognized.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Impala supports subqueries only in the <code class="ph codeph">FROM</code> clause of a query, not within the
+ <code class="ph codeph">WHERE</code> clauses. Therefore, you cannot use clauses such as <code class="ph codeph">WHERE
+ <var class="keyword varname">column</var> IN (<var class="keyword varname">subquery</var>)</code>. Also, Impala does not allow
+ <code class="ph codeph">EXISTS</code> or <code class="ph codeph">NOT EXISTS</code> clauses (although <code class="ph codeph">EXISTS</code> is a
+ reserved keyword).
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Impala supports <code class="ph codeph">UNION</code> and <code class="ph codeph">UNION ALL</code> set operators, but not
+ <code class="ph codeph">INTERSECT</code>. <span class="ph">Prefer <code class="ph codeph">UNION ALL</code> over <code class="ph codeph">UNION</code> when you know the
+ data sets are disjoint or duplicate values are not a problem; <code class="ph codeph">UNION ALL</code> is more efficient
+ because it avoids materializing and sorting the entire result set to eliminate duplicate values.</span>
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Within queries, Impala requires query aliases for any subqueries:
+ </p>
+<pre class="pre codeblock"><code>-- Without the alias 'contents_of_t1' at the end, query gives syntax error.
+select count(*) from (select * from t1) contents_of_t1;</code></pre>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ When an alias is declared for an expression in a query, that alias cannot be referenced again within
+ the same query block:
+ </p>
+<pre class="pre codeblock"><code>-- Can't reference AVERAGE twice in the SELECT list where it's defined.
+select avg(x) as average, average+1 from t1 group by x;
+ERROR: AnalysisException: couldn't resolve column reference: 'average'
+
+-- Although it can be referenced again later in the same query.
+select avg(x) as average from t1 group by x having average > 3;</code></pre>
+ <p class="p">
+ For Impala, either repeat the expression again, or abstract the expression into a <code class="ph codeph">WITH</code>
+ clause, creating named columns that can be referenced multiple times anywhere in the base query:
+ </p>
+<pre class="pre codeblock"><code>-- The following 2 query forms are equivalent.
+select avg(x) as average, avg(x)+1 from t1 group by x;
+with avg_t as (select avg(x) average from t1 group by x) select average, average+1 from avg_t;</code></pre>
+
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Impala does not support certain rarely used join types that are less appropriate for high-volume tables
+ used for data warehousing. In some cases, Impala supports join types but requires explicit syntax to
+ ensure you do not do inefficient joins of huge tables by accident. For example, Impala does not support
+ natural joins or anti-joins, and requires the <code class="ph codeph">CROSS JOIN</code> operator for Cartesian
+ products. See <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a> for details on the syntax for Impala join clauses.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Impala has a limited choice of partitioning types. Partitions are defined based on each distinct
+ combination of values for one or more partition key columns. Impala does not redistribute or check data
+ to create evenly distributed partitions; you must choose partition key columns based on your knowledge
+ of the data volume and distribution. Adapt any tables that use range, list, hash, or key partitioning
+ to use the Impala partition syntax for <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code>
+ statements. Impala partitioning is similar to range partitioning where every range has exactly one
+ value, or key partitioning where the hash function produces a separate bucket for every combination of
+ key values. See <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a> for usage details, and
+ <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> and
+ <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> for syntax.
+ </p>
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Because the number of separate partitions is potentially higher than in other database systems, keep a
+ close eye on the number of partitions and the volume of data in each one; scale back the number of
+ partition key columns if you end up with too many partitions with a small volume of data in each one.
+ Remember, to distribute work for a query across a cluster, you need at least one HDFS block per node.
+ HDFS blocks are typically multiple megabytes, <span class="ph">especially</span> for Parquet
+ files. Therefore, if each partition holds only a few megabytes of data, you are unlikely to see much
+ parallelism in the query because such a small amount of data is typically processed by a single node.
+ </div>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ For <span class="q">"top-N"</span> queries, Impala uses the <code class="ph codeph">LIMIT</code> clause rather than comparing against a
+ pseudocolumn named <code class="ph codeph">ROWNUM</code> or <code class="ph codeph">ROW_NUM</code>. See
+ <a class="xref" href="impala_limit.html#limit">LIMIT Clause</a> for details.
+ </p>
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="porting__porting_antipatterns">
+
+ <h2 class="title topictitle2" id="ariaid-title5">SQL Constructs to Doublecheck</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Some SQL constructs that are supported have behavior or defaults more oriented towards convenience than
+ optimal performance. Also, sometimes machine-generated SQL, perhaps issued through JDBC or ODBC
+ applications, might have inefficiencies or exceed internal Impala limits. As you port SQL code, be alert
+ and change these things where appropriate:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ A <code class="ph codeph">CREATE TABLE</code> statement with no <code class="ph codeph">STORED AS</code> clause creates data files
+ in plain text format, which is convenient for data interchange but not a good choice for high-volume
+ data with high-performance queries. See <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for why and
+ how to use specific file formats for compact data and high-performance queries. Especially see
+ <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>, for details about the file format most heavily optimized for
+ large-scale data warehouse queries.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ A <code class="ph codeph">CREATE TABLE</code> statement with no <code class="ph codeph">PARTITIONED BY</code> clause stores all the
+ data files in the same physical location, which can lead to scalability problems when the data volume
+ becomes large.
+ </p>
+ <p class="p">
+ On the other hand, adapting tables that were already partitioned in a different database system could
+ produce an Impala table with a high number of partitions and not enough data in each one, leading to
+ underutilization of Impala's parallel query features.
+ </p>
+ <p class="p">
+ See <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a> for details about setting up partitioning and
+ tuning the performance of queries on partitioned tables.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">INSERT ... VALUES</code> syntax is suitable for setting up toy tables with a few rows for
+ functional testing, but because each such statement creates a separate tiny file in HDFS, it is not a
+ scalable technique for loading megabytes or gigabytes (let alone petabytes) of data. Consider revising
+ your data load process to produce raw data files outside of Impala, then setting up Impala external
+ tables or using the <code class="ph codeph">LOAD DATA</code> statement to use those data files instantly in Impala
+ tables, with no conversion or indexing stage. See <a class="xref" href="impala_tables.html#external_tables">External Tables</a> and
+ <a class="xref" href="impala_load_data.html#load_data">LOAD DATA Statement</a> for details about the Impala techniques for working with
+ data files produced outside of Impala; see <a class="xref" href="impala_tutorial.html#tutorial_etl">Data Loading and Querying Examples</a> for examples
+ of ETL workflow for Impala.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If your ETL process is not optimized for Hadoop, you might end up with highly fragmented small data
+ files, or a single giant data file that cannot take advantage of distributed parallel queries or
+ partitioning. In this case, use an <code class="ph codeph">INSERT ... SELECT</code> statement to copy the data into a
+ new table and reorganize into a more efficient layout in the same operation. See
+ <a class="xref" href="impala_insert.html#insert">INSERT Statement</a> for details about the <code class="ph codeph">INSERT</code> statement.
+ </p>
+ <p class="p">
+ You can do <code class="ph codeph">INSERT ... SELECT</code> into a table with a more efficient file format (see
+ <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a>) or from an unpartitioned table into a partitioned
+ one (see <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>).
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The number of expressions allowed in an Impala query might be smaller than for some other database
+ systems, causing failures for very complicated queries (typically produced by automated SQL
+ generators). Where practical, keep the number of expressions in the <code class="ph codeph">WHERE</code> clauses to
+ approximately 2000 or fewer. As a workaround, set the query option
+ <code class="ph codeph">DISABLE_CODEGEN=true</code> if queries fail for this reason. See
+ <a class="xref" href="impala_disable_codegen.html#disable_codegen">DISABLE_CODEGEN Query Option</a> for details.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If practical, rewrite <code class="ph codeph">UNION</code> queries to use the <code class="ph codeph">UNION ALL</code> operator
+ instead. <span class="ph">Prefer <code class="ph codeph">UNION ALL</code> over <code class="ph codeph">UNION</code> when you know the
+ data sets are disjoint or duplicate values are not a problem; <code class="ph codeph">UNION ALL</code> is more efficient
+ because it avoids materializing and sorting the entire result set to eliminate duplicate values.</span>
+ </p>
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="porting__porting_next">
+
+ <h2 class="title topictitle2" id="ariaid-title6">Next Porting Steps after Verifying Syntax and Semantics</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Throughout this section, some of the decisions you make during the porting process also have a substantial
+ impact on performance. After your SQL code is ported and working correctly, doublecheck the
+ performance-related aspects of your schema design, physical layout, and queries to make sure that the
+ ported application is taking full advantage of Impala's parallelism, performance-related SQL features, and
+ integration with Hadoop components.
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Have you run the <code class="ph codeph">COMPUTE STATS</code> statement on each table involved in join queries? Have
+ you also run <code class="ph codeph">COMPUTE STATS</code> for each table used as the source table in an <code class="ph codeph">INSERT
+ ... SELECT</code> or <code class="ph codeph">CREATE TABLE AS SELECT</code> statement?
+ </li>
+
+ <li class="li">
+ Are you using the most efficient file format for your data volumes, table structure, and query
+ characteristics?
+ </li>
+
+ <li class="li">
+ Are you using partitioning effectively? That is, have you partitioned on columns that are often used for
+ filtering in <code class="ph codeph">WHERE</code> clauses? Have you partitioned at the right granularity so that there
+ is enough data in each partition to parallelize the work for each query?
+ </li>
+
+ <li class="li">
+ Does your ETL process produce a relatively small number of multi-megabyte data files (good) rather than a
+ huge number of small files (bad)?
+ </li>
+ </ul>
+
+ <p class="p">
+ See <a class="xref" href="impala_performance.html#performance">Tuning Impala for Performance</a> for details about the whole performance tuning
+ process.
+ </p>
+ </div>
+ </article>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_ports.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_ports.html b/docs/build3x/html/topics/impala_ports.html
new file mode 100644
index 0000000..5acc1b6
--- /dev/null
+++ b/docs/build3x/html/topics/impala_ports.html
@@ -0,0 +1,421 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="ports"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Ports Used by Impala</title></head><body id="ports"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Ports Used by Impala</h1>
+
+
+ <div class="body conbody" id="ports__conbody_ports">
+
+ <p class="p">
+
+ Impala uses the TCP ports listed in the following table. Before deploying Impala, ensure these ports are open
+ on each system.
+ </p>
+
+ <table class="table"><caption></caption><colgroup><col style="width:18.181818181818183%"><col style="width:27.27272727272727%"><col style="width:9.090909090909092%"><col style="width:18.181818181818183%"><col style="width:27.27272727272727%"></colgroup><thead class="thead">
+ <tr class="row">
+ <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__1">
+ Component
+ </th>
+ <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__2">
+ Service
+ </th>
+ <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__3">
+ Port
+ </th>
+ <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__4">
+ Access Requirement
+ </th>
+ <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__5">
+ Comment
+ </th>
+ </tr>
+ </thead><tbody class="tbody">
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala Daemon
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ Impala Daemon Frontend Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 21000
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ External
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Used to transmit commands and receive results by <code class="ph codeph">impala-shell</code> and
+ some ODBC drivers.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala Daemon
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ Impala Daemon Frontend Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 21050
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ External
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Used to transmit commands and receive results by applications, such as Business Intelligence tools,
+ using JDBC, the Beeswax query editor in Hue, and some ODBC drivers.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala Daemon
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ Impala Daemon Backend Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 22000
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ Internal
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Internal use only. Impala daemons use this port to communicate with each other.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala Daemon
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ StateStoreSubscriber Service Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 23000
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ Internal
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Internal use only. Impala daemons listen on this port for updates from the statestore daemon.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Catalog Daemon
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ StateStoreSubscriber Service Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 23020
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ Internal
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Internal use only. The catalog daemon listens on this port for updates from the statestore daemon.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala Daemon
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ Impala Daemon HTTP Server Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 25000
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ External
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Impala web interface for administrators to monitor and troubleshoot.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala StateStore Daemon
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ StateStore HTTP Server Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 25010
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ External
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ StateStore web interface for administrators to monitor and troubleshoot.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala Catalog Daemon
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ Catalog HTTP Server Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 25020
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ External
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Catalog service web interface for administrators to monitor and troubleshoot. New in Impala 1.2 and
+ higher.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala StateStore Daemon
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ StateStore Service Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 24000
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ Internal
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Internal use only. The statestore daemon listens on this port for registration/unregistration
+ requests.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala Catalog Daemon
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ StateStore Service Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 26000
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ Internal
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Internal use only. The catalog service uses this port to communicate with the Impala daemons. New
+ in Impala 1.2 and higher.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala Daemon
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ Llama Callback Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 28000
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ Internal
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Internal use only. Impala daemons use to communicate with Llama. New in <span class="keyword">Impala 1.3</span> and higher.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala Llama ApplicationMaster
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ Llama Thrift Admin Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 15002
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ Internal
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Internal use only. New in <span class="keyword">Impala 1.3</span> and higher.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala Llama ApplicationMaster
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ Llama Thrift Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 15000
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ Internal
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Internal use only. New in <span class="keyword">Impala 1.3</span> and higher.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala Llama ApplicationMaster
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ Llama HTTP Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 15001
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ External
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Llama service web interface for administrators to monitor and troubleshoot.
+ New in <span class="keyword">Impala 1.3</span> and higher.
+ </p>
+ </td>
+ </tr>
+ </tbody></table>
+ </div>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_prefetch_mode.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_prefetch_mode.html b/docs/build3x/html/topics/impala_prefetch_mode.html
new file mode 100644
index 0000000..b7cc1f5
--- /dev/null
+++ b/docs/build3x/html/topics/impala_prefetch_mode.html
@@ -0,0 +1,47 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="prefetch_mode"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>PREFETCH_MODE Query Option (Impala 2.6 or higher only)</title></head><body id="prefetch_mode"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">PREFETCH_MODE Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Determines whether the prefetching optimization is applied during
+ join query processing.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> numeric (0, 1)
+ or corresponding mnemonic strings (<code class="ph codeph">NONE</code>, <code class="ph codeph">HT_BUCKET</code>).
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 1 (equivalent to <code class="ph codeph">HT_BUCKET</code>)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ The default mode is 1, which means that hash table buckets are
+ prefetched during join query processing.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a>,
+ <a class="xref" href="impala_perf_joins.html#perf_joins">Performance Considerations for Join Queries</a>.
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_prereqs.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_prereqs.html b/docs/build3x/html/topics/impala_prereqs.html
new file mode 100644
index 0000000..88293d4
--- /dev/null
+++ b/docs/build3x/html/topics/impala_prereqs.html
@@ -0,0 +1,275 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_planning.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="prereqs"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Requirements</title></head><body id="prereqs"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala Requirements</h1>
+
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+ To perform as expected, Impala depends on the availability of the software, hardware, and configurations
+ described in the following sections.
+ </p>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_planning.html">Planning for Impala Deployment</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="prereqs__prereqs_os">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Supported Operating Systems</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+
+
+
+
+
+
+ Apache Impala runs on Linux systems only. See the <span class="ph filepath">README.md</span>
+ file for more information.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="prereqs__prereqs_hive">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Hive Metastore and Related Configuration</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+
+ Impala can interoperate with data stored in Hive, and uses the same infrastructure as Hive for tracking
+ metadata about schema objects such as tables and columns. The following components are prerequisites for
+ Impala:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ MySQL or PostgreSQL, to act as a metastore database for both Impala and Hive.
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ Installing and configuring a Hive metastore is an Impala requirement. Impala does not work without
+ the metastore database. For the process of installing and configuring the metastore, see
+ <a class="xref" href="impala_install.html#install">Installing Impala</a>.
+ </p>
+
+ <p class="p">
+ Always configure a <strong class="ph b">Hive metastore service</strong> rather than connecting directly to the metastore
+ database. The Hive metastore service is required to interoperate between different levels of
+ metastore APIs if this is necessary for your environment, and using it avoids known issues with
+ connecting directly to the metastore database.
+ </p>
+
+ <p class="p">
+ A summary of the metastore installation process is as follows:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ Install a MySQL or PostgreSQL database. Start the database if it is not started after installation.
+ </li>
+
+ <li class="li">
+ Download the
+ <a class="xref" href="http://www.mysql.com/products/connector/" target="_blank">MySQL
+ connector</a> or the
+ <a class="xref" href="http://jdbc.postgresql.org/download.html" target="_blank">PostgreSQL
+ connector</a> and place it in the <code class="ph codeph">/usr/share/java/</code> directory.
+ </li>
+
+ <li class="li">
+ Use the appropriate command line tool for your database to create the metastore database.
+ </li>
+
+ <li class="li">
+ Use the appropriate command line tool for your database to grant privileges for the metastore
+ database to the <code class="ph codeph">hive</code> user.
+ </li>
+
+ <li class="li">
+ Modify <code class="ph codeph">hive-site.xml</code> to include information matching your particular database: its
+ URL, username, and password. You will copy the <code class="ph codeph">hive-site.xml</code> file to the Impala
+ Configuration Directory later in the Impala installation process.
+ </li>
+ </ul>
+ </div>
+ </li>
+
+ <li class="li">
+ <strong class="ph b">Optional:</strong> Hive. Although only the Hive metastore database is required for Impala to function, you
+ might install Hive on some client machines to create and load data into tables that use certain file
+ formats. See <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details. Hive does not need to be
+ installed on the same DataNodes as Impala; it just needs access to the same metastore database.
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="prereqs__prereqs_java">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Java Dependencies</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+ Although Impala is primarily written in C++, it does use Java to communicate with various Hadoop
+ components:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The officially supported JVM for Impala is the Oracle JVM. Other JVMs might cause issues, typically
+ resulting in a failure at <span class="keyword cmdname">impalad</span> startup. In particular, the JamVM used by default on
+ certain levels of Ubuntu systems can cause <span class="keyword cmdname">impalad</span> to fail to start.
+ </li>
+
+ <li class="li">
+ Internally, the <span class="keyword cmdname">impalad</span> daemon relies on the <code class="ph codeph">JAVA_HOME</code> environment
+ variable to locate the system Java libraries. Make sure the <span class="keyword cmdname">impalad</span> service is not run
+ from an environment with an incorrect setting for this variable.
+ </li>
+
+ <li class="li">
+ All Java dependencies are packaged in the <code class="ph codeph">impala-dependencies.jar</code> file, which is located
+ at <code class="ph codeph">/usr/lib/impala/lib/</code>. These map to everything that is built under
+ <code class="ph codeph">fe/target/dependency</code>.
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="prereqs__prereqs_network">
+
+ <h2 class="title topictitle2" id="ariaid-title5">Networking Configuration Requirements</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ As part of ensuring best performance, Impala attempts to complete tasks on local data, as opposed to using
+ network connections to work with remote data. To support this goal, Impala matches
+ the <strong class="ph b">hostname</strong> provided to each Impala daemon with the <strong class="ph b">IP address</strong> of each DataNode by
+ resolving the hostname flag to an IP address. For Impala to work with local data, use a single IP interface
+ for the DataNode and the Impala daemon on each machine. Ensure that the Impala daemon's hostname flag
+ resolves to the IP address of the DataNode. For single-homed machines, this is usually automatic, but for
+ multi-homed machines, ensure that the Impala daemon's hostname resolves to the correct interface. Impala
+ tries to detect the correct hostname at start-up, and prints the derived hostname at the start of the log
+ in a message of the form:
+ </p>
+
+<pre class="pre codeblock"><code>Using hostname: impala-daemon-1.example.com</code></pre>
+
+ <p class="p">
+ In the majority of cases, this automatic detection works correctly. If you need to explicitly set the
+ hostname, do so by setting the <code class="ph codeph">--hostname</code> flag.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="prereqs__prereqs_hardware">
+
+ <h2 class="title topictitle2" id="ariaid-title6">Hardware Requirements</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+
+
+
+
+
+
+ During join operations, portions of data from each joined table are loaded into memory. Data sets can be
+ very large, so ensure your hardware has sufficient memory to accommodate the joins you anticipate
+ completing.
+ </p>
+
+ <p class="p">
+ While requirements vary according to data set size, the following is generally recommended:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ CPU - Impala version 2.2 and higher uses the SSSE3 instruction set, which is included in newer processors.
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ This required level of processor is the same as in Impala version 1.x. The Impala 2.0 and 2.1 releases
+ had a stricter requirement for the SSE4.1 instruction set, which has now been relaxed.
+ </div>
+
+ </li>
+
+ <li class="li">
+ Memory - 128 GB or more recommended, ideally 256 GB or more. If the intermediate results during query
+ processing on a particular node exceed the amount of memory available to Impala on that node, the query
+ writes temporary work data to disk, which can lead to long query times. Note that because the work is
+ parallelized, and intermediate results for aggregate queries are typically smaller than the original
+ data, Impala can query and join tables that are much larger than the memory available on an individual
+ node.
+ </li>
+
+ <li class="li">
+ Storage - DataNodes with 12 or more disks each. I/O speeds are often the limiting factor for disk
+ performance with Impala. Ensure that you have sufficient disk space to store the data Impala will be
+ querying.
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="prereqs__prereqs_account">
+
+ <h2 class="title topictitle2" id="ariaid-title7">User Account Requirements</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+
+ Impala creates and uses a user and group named <code class="ph codeph">impala</code>. Do not delete this account or group
+ and do not modify the account's or group's permissions and rights. Ensure no existing systems obstruct the
+ functioning of these accounts and groups. For example, if you have scripts that delete user accounts not in
+ a white-list, add these accounts to the list of permitted accounts.
+ </p>
+
+ <p class="p">
+ For correct file deletion during <code class="ph codeph">DROP TABLE</code> operations, Impala must be able to move files
+ to the HDFS trashcan. You might need to create an HDFS directory <span class="ph filepath">/user/impala</span>,
+ writeable by the <code class="ph codeph">impala</code> user, so that the trashcan can be created. Otherwise, data files
+ might remain behind after a <code class="ph codeph">DROP TABLE</code> statement.
+ </p>
+
+ <p class="p">
+ Impala should not run as root. Best Impala performance is achieved using direct reads, but root is not
+ permitted to use direct reads. Therefore, running Impala as root negatively affects performance.
+ </p>
+
+ <p class="p">
+ By default, any user can connect to Impala and access all the associated databases and tables. You can
+ enable authorization and authentication based on the Linux OS user who connects to the Impala server, and
+ the associated groups for that user. <a class="xref" href="impala_security.html#security">Impala Security</a> for details. These
+ security features do not change the underlying file permission requirements; the <code class="ph codeph">impala</code>
+ user still needs to be able to access the data files.
+ </p>
+ </div>
+ </article>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_processes.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_processes.html b/docs/build3x/html/topics/impala_processes.html
new file mode 100644
index 0000000..4d64072
--- /dev/null
+++ b/docs/build3x/html/topics/impala_processes.html
@@ -0,0 +1,115 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_config_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="processes"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Starting Impala</title></head><body id="processes"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Starting Impala</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+
+ To activate Impala if it is installed but not yet started:
+ </p>
+
+ <ol class="ol">
+ <li class="li">
+ Set any necessary configuration options for the Impala services. See
+ <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for details.
+ </li>
+
+ <li class="li">
+ Start one instance of the Impala statestore. The statestore helps Impala to distribute work efficiently,
+ and to continue running in the event of availability problems for other Impala nodes. If the statestore
+ becomes unavailable, Impala continues to function.
+ </li>
+
+ <li class="li">
+ Start one instance of the Impala catalog service.
+ </li>
+
+ <li class="li">
+ Start the main Impala service on one or more DataNodes, ideally on all DataNodes to maximize local
+ processing and avoid network traffic due to remote reads.
+ </li>
+ </ol>
+
+ <p class="p">
+ Once Impala is running, you can conduct interactive experiments using the instructions in
+ <a class="xref" href="impala_tutorial.html#tutorial">Impala Tutorials</a> and try <a class="xref" href="impala_impala_shell.html#impala_shell">Using the Impala Shell (impala-shell Command)</a>.
+ </p>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_config_options.html">Modifying Impala Startup Options</a></strong><br></li></ul></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="processes__starting_via_cmdline">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Starting Impala from the Command Line</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ To start the Impala state store and Impala from the command line or a script, you can either use the
+ <span class="keyword cmdname">service</span> command or you can start the daemons directly through the
+ <span class="keyword cmdname">impalad</span>, <code class="ph codeph">statestored</code>, and <span class="keyword cmdname">catalogd</span> executables.
+ </p>
+
+ <p class="p">
+ Start the Impala statestore and then start <code class="ph codeph">impalad</code> instances. You can modify the values
+ the service initialization scripts use when starting the statestore and Impala by editing
+ <code class="ph codeph">/etc/default/impala</code>.
+ </p>
+
+ <p class="p">
+ Start the statestore service using a command similar to the following:
+ </p>
+
+ <div class="p">
+<pre class="pre codeblock"><code>$ sudo service impala-state-store start</code></pre>
+ </div>
+
+ <p class="p">
+ Start the catalog service using a command similar to the following:
+ </p>
+
+<pre class="pre codeblock"><code>$ sudo service impala-catalog start</code></pre>
+
+ <p class="p">
+ Start the Impala service on each DataNode using a command similar to the following:
+ </p>
+
+ <div class="p">
+<pre class="pre codeblock"><code>$ sudo service impala-server start</code></pre>
+ </div>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ In <span class="keyword">Impala 2.5</span> and higher, Impala UDFs and UDAs written in C++ are persisted in the metastore database.
+ Java UDFs are also persisted, if they were created with the new <code class="ph codeph">CREATE FUNCTION</code> syntax for Java UDFs,
+ where the Java function argument and return types are omitted.
+ Java-based UDFs created with the old <code class="ph codeph">CREATE FUNCTION</code> syntax do not persist across restarts
+ because they are held in the memory of the <span class="keyword cmdname">catalogd</span> daemon.
+ Until you re-create such Java UDFs using the new <code class="ph codeph">CREATE FUNCTION</code> syntax,
+ you must reload those Java-based UDFs by running the original <code class="ph codeph">CREATE FUNCTION</code> statements again each time
+ you restart the <span class="keyword cmdname">catalogd</span> daemon.
+ Prior to <span class="keyword">Impala 2.5</span> the requirement to reload functions after a restart applied to both C++ and Java functions.
+ </p>
+ </div>
+
+ <div class="p">
+ If any of the services fail to start, review:
+ <ul class="ul">
+ <li class="li">
+ <a class="xref" href="impala_logging.html#logs_debug">Reviewing Impala Logs</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_troubleshooting.html#troubleshooting">Troubleshooting Impala</a>
+ </li>
+ </ul>
+ </div>
+ </div>
+ </article>
+</article></main></body></html>