You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by mi...@apache.org on 2018/05/09 21:10:44 UTC
[35/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_describe.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_describe.html b/docs/build3x/html/topics/impala_describe.html
new file mode 100644
index 0000000..5c4edf9
--- /dev/null
+++ b/docs/build3x/html/topics/impala_describe.html
@@ -0,0 +1,817 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="describe"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DESCRIBE Statement</title></head><body id="describe"><main role="main"><article role="article" aria-labelledby="describe__desc">
+
+ <h1 class="title topictitle1" id="describe__desc">DESCRIBE Statement</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The <code class="ph codeph">DESCRIBE</code> statement displays metadata about a table, such as the column names and their
+ data types.
+ <span class="ph">In <span class="keyword">Impala 2.3</span> and higher, you can specify the name of a complex type column, which takes
+ the form of a dotted path. The path might include multiple components in the case of a nested type definition.</span>
+ <span class="ph">In <span class="keyword">Impala 2.5</span> and higher, the <code class="ph codeph">DESCRIBE DATABASE</code> form can display
+ information about a database.</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>DESCRIBE [DATABASE] [FORMATTED|EXTENDED] <var class="keyword varname">object_name</var>
+
+object_name ::=
+ [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>[.<var class="keyword varname">complex_col_name</var> ...]
+ | <var class="keyword varname">db_name</var>
+</code></pre>
+
+ <p class="p">
+ You can use the abbreviation <code class="ph codeph">DESC</code> for the <code class="ph codeph">DESCRIBE</code> statement.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">DESCRIBE FORMATTED</code> variation displays additional information, in a format familiar to
+ users of Apache Hive. The extra information includes low-level details such as whether the table is internal
+ or external, when it was created, the file format, the location of the data in HDFS, whether the object is a
+ table or a view, and (for views) the text of the query from the view definition.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ The <code class="ph codeph">Compressed</code> field is not a reliable indicator of whether the table contains compressed
+ data. It typically always shows <code class="ph codeph">No</code>, because the compression settings only apply during the
+ session that loads data and are not stored persistently with the table metadata.
+ </div>
+
+<p class="p">
+ <strong class="ph b">Describing databases:</strong>
+</p>
+
+<p class="p">
+ By default, the <code class="ph codeph">DESCRIBE</code> output for a database includes the location
+ and the comment, which can be set by the <code class="ph codeph">LOCATION</code> and <code class="ph codeph">COMMENT</code>
+ clauses on the <code class="ph codeph">CREATE DATABASE</code> statement.
+</p>
+
+<p class="p">
+ The additional information displayed by the <code class="ph codeph">FORMATTED</code> or <code class="ph codeph">EXTENDED</code>
+ keyword includes the HDFS user ID that is considered the owner of the database, and any
+ optional database properties. The properties could be specified by the <code class="ph codeph">WITH DBPROPERTIES</code>
+ clause if the database is created using a Hive <code class="ph codeph">CREATE DATABASE</code> statement.
+ Impala currently does not set or do any special processing based on those properties.
+</p>
+
+<p class="p">
+The following examples show the variations in syntax and output for
+describing databases. This feature is available in <span class="keyword">Impala 2.5</span>
+and higher.
+</p>
+
+<pre class="pre codeblock"><code>
+describe database default;
++---------+----------------------+-----------------------+
+| name | location | comment |
++---------+----------------------+-----------------------+
+| default | /user/hive/warehouse | Default Hive database |
++---------+----------------------+-----------------------+
+
+describe database formatted default;
++---------+----------------------+-----------------------+
+| name | location | comment |
++---------+----------------------+-----------------------+
+| default | /user/hive/warehouse | Default Hive database |
+| Owner: | | |
+| | public | ROLE |
++---------+----------------------+-----------------------+
+
+describe database extended default;
++---------+----------------------+-----------------------+
+| name | location | comment |
++---------+----------------------+-----------------------+
+| default | /user/hive/warehouse | Default Hive database |
+| Owner: | | |
+| | public | ROLE |
++---------+----------------------+-----------------------+
+</code></pre>
+
+<p class="p">
+ <strong class="ph b">Describing tables:</strong>
+</p>
+
+<p class="p">
+ If the <code class="ph codeph">DATABASE</code> keyword is omitted, the default
+ for the <code class="ph codeph">DESCRIBE</code> statement is to refer to a table.
+</p>
+ <p class="p">
+ If you have the <code class="ph codeph">SELECT</code> privilege on a subset of the table
+ columns and no other relevant table/database/server-level privileges,
+ <code class="ph codeph">DESCRIBE</code> returns the data from the columns you have
+ access to.
+ </p>
+
+ <p class="p">
+ If you have the <code class="ph codeph">SELECT</code> privilege on a subset of the table
+ columns and no other relevant table/database/server-level privileges,
+ <code class="ph codeph">DESCRIBE FORMATTED/EXTENDED</code> does not return
+ the <code class="ph codeph">LOCATION</code> field. The <code class="ph codeph">LOCATION</code> data
+ is shown if you have any privilege on the table, the containing database
+ or the server.
+ </p>
+
+<pre class="pre codeblock"><code>
+-- By default, the table is assumed to be in the current database.
+describe my_table;
++------+--------+---------+
+| name | type | comment |
++------+--------+---------+
+| x | int | |
+| s | string | |
++------+--------+---------+
+
+-- Use a fully qualified table name to specify a table in any database.
+describe my_database.my_table;
++------+--------+---------+
+| name | type | comment |
++------+--------+---------+
+| x | int | |
+| s | string | |
++------+--------+---------+
+
+-- The formatted or extended output includes additional useful information.
+-- The LOCATION field is especially useful to know for DDL statements and HDFS commands
+-- during ETL jobs. (The LOCATION includes a full hdfs:// URL, omitted here for readability.)
+describe formatted my_table;
++------------------------------+----------------------------------------------+----------------------+
+| name | type | comment |
++------------------------------+----------------------------------------------+----------------------+
+| # col_name | data_type | comment |
+| | NULL | NULL |
+| x | int | NULL |
+| s | string | NULL |
+| | NULL | NULL |
+| # Detailed Table Information | NULL | NULL |
+| Database: | my_database | NULL |
+| Owner: | jrussell | NULL |
+| CreateTime: | Fri Mar 18 15:58:00 PDT 2016 | NULL |
+| LastAccessTime: | UNKNOWN | NULL |
+| Protect Mode: | None | NULL |
+| Retention: | 0 | NULL |
+| Location: | /user/hive/warehouse/my_database.db/my_table | NULL |
+| Table Type: | MANAGED_TABLE | NULL |
+| Table Parameters: | NULL | NULL |
+| | transient_lastDdlTime | 1458341880 |
+| | NULL | NULL |
+| # Storage Information | NULL | NULL |
+| SerDe Library: | org. ... .LazySimpleSerDe | NULL |
+| InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL |
+| OutputFormat: | org. ... .HiveIgnoreKeyTextOutputFormat | NULL |
+| Compressed: | No | NULL |
+| Num Buckets: | 0 | NULL |
+| Bucket Columns: | [] | NULL |
+| Sort Columns: | [] | NULL |
++------------------------------+----------------------------------------------+----------------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ Because the column definitions for complex types can become long, particularly when such types are nested,
+ the <code class="ph codeph">DESCRIBE</code> statement uses special formatting for complex type columns to make the output readable.
+ </p>
+
+ <p class="p">
+ For the <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code> types available in
+ <span class="keyword">Impala 2.3</span> and higher, the <code class="ph codeph">DESCRIBE</code> output is formatted to avoid
+ excessively long lines for multiple fields within a <code class="ph codeph">STRUCT</code>, or a nested sequence of
+ complex types.
+ </p>
+
+ <p class="p">
+ You can pass a multi-part qualified name to <code class="ph codeph">DESCRIBE</code>
+ to specify an <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+ column and visualize its structure as if it were a table.
+ For example, if table <code class="ph codeph">T1</code> contains an <code class="ph codeph">ARRAY</code> column
+ <code class="ph codeph">A1</code>, you could issue the statement <code class="ph codeph">DESCRIBE t1.a1</code>.
+ If table <code class="ph codeph">T1</code> contained a <code class="ph codeph">STRUCT</code> column <code class="ph codeph">S1</code>,
+ and a field <code class="ph codeph">F1</code> within the <code class="ph codeph">STRUCT</code> was a <code class="ph codeph">MAP</code>,
+ you could issue the statement <code class="ph codeph">DESCRIBE t1.s1.f1</code>.
+ An <code class="ph codeph">ARRAY</code> is shown as a two-column table, with
+ <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code> columns.
+ A <code class="ph codeph">STRUCT</code> is shown as a table with each field
+ representing a column in the table.
+ A <code class="ph codeph">MAP</code> is shown as a two-column table, with
+ <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code> columns.
+ </p>
+
+ <p class="p">
+ For example, here is the <code class="ph codeph">DESCRIBE</code> output for a table containing a single top-level column
+ of each complex type:
+ </p>
+
+<pre class="pre codeblock"><code>create table t1 (x int, a array<int>, s struct<f1: string, f2: bigint>, m map<string,int>) stored as parquet;
+
+describe t1;
++------+-----------------+---------+
+| name | type | comment |
++------+-----------------+---------+
+| x | int | |
+| a | array<int> | |
+| s | struct< | |
+| | f1:string, | |
+| | f2:bigint | |
+| | > | |
+| m | map<string,int> | |
++------+-----------------+---------+
+
+</code></pre>
+
+ <p class="p">
+ Here are examples showing how to <span class="q">"drill down"</span> into the layouts of complex types, including
+ using multi-part names to examine the definitions of nested types.
+ The <code class="ph codeph">< ></code> delimiters identify the columns with complex types;
+ these are the columns where you can descend another level to see the parts that make up
+ the complex type.
+ This technique helps you to understand the multi-part names you use as table references in queries
+ involving complex types, and the corresponding column names you refer to in the <code class="ph codeph">SELECT</code> list.
+ These tables are from the <span class="q">"nested TPC-H"</span> schema, shown in detail in
+ <a class="xref" href="impala_complex_types.html#complex_sample_schema">Sample Schema and Data for Experimenting with Impala Complex Types</a>.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">REGION</code> table contains an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>
+ elements:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The first <code class="ph codeph">DESCRIBE</code> specifies the table name, to display the definition
+ of each top-level column.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The second <code class="ph codeph">DESCRIBE</code> specifies the name of a complex
+ column, <code class="ph codeph">REGION.R_NATIONS</code>, showing that when you include the name of an <code class="ph codeph">ARRAY</code>
+ column in a <code class="ph codeph">FROM</code> clause, that table reference acts like a two-column table with
+ columns <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code>.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The final <code class="ph codeph">DESCRIBE</code> specifies the fully qualified name of the <code class="ph codeph">ITEM</code> field,
+ to display the layout of its underlying <code class="ph codeph">STRUCT</code> type in table format, with the fields
+ mapped to column names.
+ </p>
+ </li>
+ </ul>
+
+<pre class="pre codeblock"><code>
+-- #1: The overall layout of the entire table.
+describe region;
++-------------+-------------------------+---------+
+| name | type | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint | |
+| r_name | string | |
+| r_comment | string | |
+| r_nations | array<struct< | |
+| | n_nationkey:smallint, | |
+| | n_name:string, | |
+| | n_comment:string | |
+| | >> | |
++-------------+-------------------------+---------+
+
+-- #2: The ARRAY column within the table.
+describe region.r_nations;
++------+-------------------------+---------+
+| name | type | comment |
++------+-------------------------+---------+
+| item | struct< | |
+| | n_nationkey:smallint, | |
+| | n_name:string, | |
+| | n_comment:string | |
+| | > | |
+| pos | bigint | |
++------+-------------------------+---------+
+
+-- #3: The STRUCT that makes up each ARRAY element.
+-- The fields of the STRUCT act like columns of a table.
+describe region.r_nations.item;
++-------------+----------+---------+
+| name | type | comment |
++-------------+----------+---------+
+| n_nationkey | smallint | |
+| n_name | string | |
+| n_comment | string | |
++-------------+----------+---------+
+
+</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">CUSTOMER</code> table contains an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>
+ elements, where one field in the <code class="ph codeph">STRUCT</code> is another <code class="ph codeph">ARRAY</code> of
+ <code class="ph codeph">STRUCT</code> elements:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Again, the initial <code class="ph codeph">DESCRIBE</code> specifies only the table name.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The second <code class="ph codeph">DESCRIBE</code> specifies the qualified name of the complex
+ column, <code class="ph codeph">CUSTOMER.C_ORDERS</code>, showing how an <code class="ph codeph">ARRAY</code>
+ is represented as a two-column table with columns <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code>.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The third <code class="ph codeph">DESCRIBE</code> specifies the qualified name of the <code class="ph codeph">ITEM</code>
+ of the <code class="ph codeph">ARRAY</code> column, to see the structure of the nested <code class="ph codeph">ARRAY</code>.
+ Again, it has has two parts, <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code>. Because the
+ <code class="ph codeph">ARRAY</code> contains a <code class="ph codeph">STRUCT</code>, the layout of the <code class="ph codeph">STRUCT</code>
+ is shown.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The fourth and fifth <code class="ph codeph">DESCRIBE</code> statements drill down into a <code class="ph codeph">STRUCT</code> field that
+ is itself a complex type, an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>.
+ The <code class="ph codeph">ITEM</code> portion of the qualified name is only required when the <code class="ph codeph">ARRAY</code>
+ elements are anonymous. The fields of the <code class="ph codeph">STRUCT</code> give names to any other complex types
+ nested inside the <code class="ph codeph">STRUCT</code>. Therefore, the <code class="ph codeph">DESCRIBE</code> parameters
+ <code class="ph codeph">CUSTOMER.C_ORDERS.ITEM.O_LINEITEMS</code> and <code class="ph codeph">CUSTOMER.C_ORDERS.O_LINEITEMS</code>
+ are equivalent. (For brevity, leave out the <code class="ph codeph">ITEM</code> portion of
+ a qualified name when it is not required.)
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The final <code class="ph codeph">DESCRIBE</code> shows the layout of the deeply nested <code class="ph codeph">STRUCT</code> type.
+ Because there are no more complex types nested inside this <code class="ph codeph">STRUCT</code>, this is as far
+ as you can drill down into the layout for this table.
+ </p>
+ </li>
+ </ul>
+
+<pre class="pre codeblock"><code>-- #1: The overall layout of the entire table.
+describe customer;
++--------------+------------------------------------+
+| name | type |
++--------------+------------------------------------+
+| c_custkey | bigint |
+... more scalar columns ...
+| c_orders | array<struct< |
+| | o_orderkey:bigint, |
+| | o_orderstatus:string, |
+| | o_totalprice:decimal(12,2), |
+| | o_orderdate:string, |
+| | o_orderpriority:string, |
+| | o_clerk:string, |
+| | o_shippriority:int, |
+| | o_comment:string, |
+| | o_lineitems:array<struct< |
+| | l_partkey:bigint, |
+| | l_suppkey:bigint, |
+| | l_linenumber:int, |
+| | l_quantity:decimal(12,2), |
+| | l_extendedprice:decimal(12,2), |
+| | l_discount:decimal(12,2), |
+| | l_tax:decimal(12,2), |
+| | l_returnflag:string, |
+| | l_linestatus:string, |
+| | l_shipdate:string, |
+| | l_commitdate:string, |
+| | l_receiptdate:string, |
+| | l_shipinstruct:string, |
+| | l_shipmode:string, |
+| | l_comment:string |
+| | >> |
+| | >> |
++--------------+------------------------------------+
+
+-- #2: The ARRAY column within the table.
+describe customer.c_orders;
++------+------------------------------------+
+| name | type |
++------+------------------------------------+
+| item | struct< |
+| | o_orderkey:bigint, |
+| | o_orderstatus:string, |
+... more struct fields ...
+| | o_lineitems:array<struct< |
+| | l_partkey:bigint, |
+| | l_suppkey:bigint, |
+... more nested struct fields ...
+| | l_comment:string |
+| | >> |
+| | > |
+| pos | bigint |
++------+------------------------------------+
+
+-- #3: The STRUCT that makes up each ARRAY element.
+-- The fields of the STRUCT act like columns of a table.
+describe customer.c_orders.item;
++-----------------+----------------------------------+
+| name | type |
++-----------------+----------------------------------+
+| o_orderkey | bigint |
+| o_orderstatus | string |
+| o_totalprice | decimal(12,2) |
+| o_orderdate | string |
+| o_orderpriority | string |
+| o_clerk | string |
+| o_shippriority | int |
+| o_comment | string |
+| o_lineitems | array<struct< |
+| | l_partkey:bigint, |
+| | l_suppkey:bigint, |
+... more struct fields ...
+| | l_comment:string |
+| | >> |
++-----------------+----------------------------------+
+
+-- #4: The ARRAY nested inside the STRUCT elements of the first ARRAY.
+describe customer.c_orders.item.o_lineitems;
++------+----------------------------------+
+| name | type |
++------+----------------------------------+
+| item | struct< |
+| | l_partkey:bigint, |
+| | l_suppkey:bigint, |
+... more struct fields ...
+| | l_comment:string |
+| | > |
+| pos | bigint |
++------+----------------------------------+
+
+-- #5: Shorter form of the previous DESCRIBE. Omits the .ITEM portion of the name
+-- because O_LINEITEMS and other field names provide a way to refer to things
+-- inside the ARRAY element.
+describe customer.c_orders.o_lineitems;
++------+----------------------------------+
+| name | type |
++------+----------------------------------+
+| item | struct< |
+| | l_partkey:bigint, |
+| | l_suppkey:bigint, |
+... more struct fields ...
+| | l_comment:string |
+| | > |
+| pos | bigint |
++------+----------------------------------+
+
+-- #6: The STRUCT representing ARRAY elements nested inside
+-- another ARRAY of STRUCTs. The lack of any complex types
+-- in this output means this is as far as DESCRIBE can
+-- descend into the table layout.
+describe customer.c_orders.o_lineitems.item;
++-----------------+---------------+
+| name | type |
++-----------------+---------------+
+| l_partkey | bigint |
+| l_suppkey | bigint |
+... more scalar columns ...
+| l_comment | string |
++-----------------+---------------+
+
+</code></pre>
+
+<p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+<p class="p">
+ After the <span class="keyword cmdname">impalad</span> daemons are restarted, the first query against a table can take longer
+ than subsequent queries, because the metadata for the table is loaded before the query is processed. This
+ one-time delay for each table can cause misleading results in benchmark tests or cause unnecessary concern.
+ To <span class="q">"warm up"</span> the Impala metadata cache, you can issue a <code class="ph codeph">DESCRIBE</code> statement in advance
+ for each table you intend to access later.
+</p>
+
+<p class="p">
+ When you are dealing with data files stored in HDFS, sometimes it is important to know details such as the
+ path of the data files for an Impala table, and the hostname for the namenode. You can get this information
+ from the <code class="ph codeph">DESCRIBE FORMATTED</code> output. You specify HDFS URIs or path specifications with
+ statements such as <code class="ph codeph">LOAD DATA</code> and the <code class="ph codeph">LOCATION</code> clause of <code class="ph codeph">CREATE
+ TABLE</code> or <code class="ph codeph">ALTER TABLE</code>. You might also use HDFS URIs or paths with Linux commands
+ such as <span class="keyword cmdname">hadoop</span> and <span class="keyword cmdname">hdfs</span> to copy, rename, and so on, data files in HDFS.
+</p>
+
+<p class="p">
+ If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+ load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+ statement wait before returning, until the new or changed metadata has been received by all the Impala
+ nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+ </p>
+
+<p class="p">
+ Each table can also have associated table statistics and column statistics. To see these categories of
+ information, use the <code class="ph codeph">SHOW TABLE STATS <var class="keyword varname">table_name</var></code> and <code class="ph codeph">SHOW COLUMN
+ STATS <var class="keyword varname">table_name</var></code> statements.
+
+ See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for details.
+</p>
+
+<div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ After adding or replacing data in a table used in performance-critical queries, issue a <code class="ph codeph">COMPUTE
+ STATS</code> statement to make sure all statistics are up-to-date. Consider updating statistics for a
+ table after any <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS
+ SELECT</code> statement in Impala, or after loading data through Hive and doing a <code class="ph codeph">REFRESH
+ <var class="keyword varname">table_name</var></code> in Impala. This technique is especially important for tables that
+ are very large, used in join queries, or both.
+ </div>
+
+<p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<p class="p">
+ The following example shows the results of both a standard <code class="ph codeph">DESCRIBE</code> and <code class="ph codeph">DESCRIBE
+ FORMATTED</code> for different kinds of schema objects:
+</p>
+
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">DESCRIBE</code> for a table or a view returns the name, type, and comment for each of the
+ columns. For a view, if the column value is computed by an expression, the column name is automatically
+ generated as <code class="ph codeph">_c0</code>, <code class="ph codeph">_c1</code>, and so on depending on the ordinal number of the
+ column.
+ </li>
+
+ <li class="li">
+ A table created with no special format or storage clauses is designated as a <code class="ph codeph">MANAGED_TABLE</code>
+ (an <span class="q">"internal table"</span> in Impala terminology). Its data files are stored in an HDFS directory under the
+ default Hive data directory. By default, it uses Text data format.
+ </li>
+
+ <li class="li">
+ A view is designated as <code class="ph codeph">VIRTUAL_VIEW</code> in <code class="ph codeph">DESCRIBE FORMATTED</code> output. Some
+ of its properties are <code class="ph codeph">NULL</code> or blank because they are inherited from the base table. The
+ text of the query that defines the view is part of the <code class="ph codeph">DESCRIBE FORMATTED</code> output.
+ </li>
+
+ <li class="li">
+ A table with additional clauses in the <code class="ph codeph">CREATE TABLE</code> statement has differences in
+ <code class="ph codeph">DESCRIBE FORMATTED</code> output. The output for <code class="ph codeph">T2</code> includes the
+ <code class="ph codeph">EXTERNAL_TABLE</code> keyword because of the <code class="ph codeph">CREATE EXTERNAL TABLE</code> syntax, and
+ different <code class="ph codeph">InputFormat</code> and <code class="ph codeph">OutputFormat</code> fields to reflect the Parquet file
+ format.
+ </li>
+ </ul>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table t1 (x int, y int, s string);
+Query: create table t1 (x int, y int, s string)
+[localhost:21000] > describe t1;
+Query: describe t1
+Query finished, fetching results ...
++------+--------+---------+
+| name | type | comment |
++------+--------+---------+
+| x | int | |
+| y | int | |
+| s | string | |
++------+--------+---------+
+Returned 3 row(s) in 0.13s
+[localhost:21000] > describe formatted t1;
+Query: describe formatted t1
+Query finished, fetching results ...
++------------------------------+--------------------------------------------+------------+
+| name | type | comment |
++------------------------------+--------------------------------------------+------------+
+| # col_name | data_type | comment |
+| | NULL | NULL |
+| x | int | None |
+| y | int | None |
+| s | string | None |
+| | NULL | NULL |
+| # Detailed Table Information | NULL | NULL |
+| Database: | describe_formatted | NULL |
+| Owner: | doc_demo | NULL |
+| CreateTime: | Mon Jul 22 17:03:16 EDT 2013 | NULL |
+| LastAccessTime: | UNKNOWN | NULL |
+| Protect Mode: | None | NULL |
+| Retention: | 0 | NULL |
+| Location: | hdfs://127.0.0.1:8020/user/hive/warehouse/ | |
+| | describe_formatted.db/t1 | NULL |
+| Table Type: | MANAGED_TABLE | NULL |
+| Table Parameters: | NULL | NULL |
+| | transient_lastDdlTime | 1374526996 |
+| | NULL | NULL |
+| # Storage Information | NULL | NULL |
+| SerDe Library: | org.apache.hadoop.hive.serde2.lazy. | |
+| | LazySimpleSerDe | NULL |
+| InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL |
+| OutputFormat: | org.apache.hadoop.hive.ql.io. | |
+| | HiveIgnoreKeyTextOutputFormat | NULL |
+| Compressed: | No | NULL |
+| Num Buckets: | 0 | NULL |
+| Bucket Columns: | [] | NULL |
+| Sort Columns: | [] | NULL |
++------------------------------+--------------------------------------------+------------+
+Returned 26 row(s) in 0.03s
+[localhost:21000] > create view v1 as select x, upper(s) from t1;
+Query: create view v1 as select x, upper(s) from t1
+[localhost:21000] > describe v1;
+Query: describe v1
+Query finished, fetching results ...
++------+--------+---------+
+| name | type | comment |
++------+--------+---------+
+| x | int | |
+| _c1 | string | |
++------+--------+---------+
+Returned 2 row(s) in 0.10s
+[localhost:21000] > describe formatted v1;
+Query: describe formatted v1
+Query finished, fetching results ...
++------------------------------+------------------------------+----------------------+
+| name | type | comment |
++------------------------------+------------------------------+----------------------+
+| # col_name | data_type | comment |
+| | NULL | NULL |
+| x | int | None |
+| _c1 | string | None |
+| | NULL | NULL |
+| # Detailed Table Information | NULL | NULL |
+| Database: | describe_formatted | NULL |
+| Owner: | doc_demo | NULL |
+| CreateTime: | Mon Jul 22 16:56:38 EDT 2013 | NULL |
+| LastAccessTime: | UNKNOWN | NULL |
+| Protect Mode: | None | NULL |
+| Retention: | 0 | NULL |
+| Table Type: | VIRTUAL_VIEW | NULL |
+| Table Parameters: | NULL | NULL |
+| | transient_lastDdlTime | 1374526598 |
+| | NULL | NULL |
+| # Storage Information | NULL | NULL |
+| SerDe Library: | null | NULL |
+| InputFormat: | null | NULL |
+| OutputFormat: | null | NULL |
+| Compressed: | No | NULL |
+| Num Buckets: | 0 | NULL |
+| Bucket Columns: | [] | NULL |
+| Sort Columns: | [] | NULL |
+| | NULL | NULL |
+| # View Information | NULL | NULL |
+| View Original Text: | SELECT x, upper(s) FROM t1 | NULL |
+| View Expanded Text: | SELECT x, upper(s) FROM t1 | NULL |
++------------------------------+------------------------------+----------------------+
+Returned 28 row(s) in 0.03s
+[localhost:21000] > create external table t2 (x int, y int, s string) stored as parquet location '/user/doc_demo/sample_data';
+[localhost:21000] > describe formatted t2;
+Query: describe formatted t2
+Query finished, fetching results ...
++------------------------------+----------------------------------------------------+------------+
+| name | type | comment |
++------------------------------+----------------------------------------------------+------------+
+| # col_name | data_type | comment |
+| | NULL | NULL |
+| x | int | None |
+| y | int | None |
+| s | string | None |
+| | NULL | NULL |
+| # Detailed Table Information | NULL | NULL |
+| Database: | describe_formatted | NULL |
+| Owner: | doc_demo | NULL |
+| CreateTime: | Mon Jul 22 17:01:47 EDT 2013 | NULL |
+| LastAccessTime: | UNKNOWN | NULL |
+| Protect Mode: | None | NULL |
+| Retention: | 0 | NULL |
+| Location: | hdfs://127.0.0.1:8020/user/doc_demo/sample_data | NULL |
+| Table Type: | EXTERNAL_TABLE | NULL |
+| Table Parameters: | NULL | NULL |
+| | EXTERNAL | TRUE |
+| | transient_lastDdlTime | 1374526907 |
+| | NULL | NULL |
+| # Storage Information | NULL | NULL |
+| SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL |
+| InputFormat: | org.apache.impala.hive.serde.ParquetInputFormat | NULL |
+| OutputFormat: | org.apache.impala.hive.serde.ParquetOutputFormat | NULL |
+| Compressed: | No | NULL |
+| Num Buckets: | 0 | NULL |
+| Bucket Columns: | [] | NULL |
+| Sort Columns: | [] | NULL |
++------------------------------+----------------------------------------------------+------------+
+Returned 27 row(s) in 0.17s</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong>
+ </p>
+ <p class="p">
+ The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+ typically the <code class="ph codeph">impala</code> user, must have read and execute
+ permissions for all directories that are part of the table.
+ (A table could span multiple different HDFS directories if it is partitioned.
+ The directories could be widely scattered because a partition can reside
+ in an arbitrary HDFS directory based on its <code class="ph codeph">LOCATION</code> attribute.)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+
+ <p class="p">
+ The information displayed for Kudu tables includes the additional attributes
+ that are only applicable for Kudu tables:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ Whether or not the column is part of the primary key. Every Kudu table
+ has a <code class="ph codeph">true</code> value here for at least one column. There
+ could be multiple <code class="ph codeph">true</code> values, for tables with
+ composite primary keys.
+ </li>
+ <li class="li">
+ Whether or not the column is nullable. Specified by the <code class="ph codeph">NULL</code>
+ or <code class="ph codeph">NOT NULL</code> attributes on the <code class="ph codeph">CREATE TABLE</code> statement.
+ Columns that are part of the primary key are automatically non-nullable.
+ </li>
+ <li class="li">
+ The default value, if any, for the column. Specified by the <code class="ph codeph">DEFAULT</code>
+ attribute on the <code class="ph codeph">CREATE TABLE</code> statement. If the default value is
+ <code class="ph codeph">NULL</code>, that is not indicated in this column. It is implied by
+ <code class="ph codeph">nullable</code> being true and no other default value specified.
+ </li>
+ <li class="li">
+ The encoding used for values in the column. Specified by the <code class="ph codeph">ENCODING</code>
+ attribute on the <code class="ph codeph">CREATE TABLE</code> statement.
+ </li>
+ <li class="li">
+ The compression used for values in the column. Specified by the <code class="ph codeph">COMPRESSION</code>
+ attribute on the <code class="ph codeph">CREATE TABLE</code> statement.
+ </li>
+ <li class="li">
+ The block size (in bytes) used for the underlying Kudu storage layer for the column.
+ Specified by the <code class="ph codeph">BLOCK_SIZE</code> attribute on the <code class="ph codeph">CREATE TABLE</code>
+ statement.
+ </li>
+ </ul>
+
+ <p class="p">
+ The following example shows <code class="ph codeph">DESCRIBE</code> output for a simple Kudu table, with
+ a single-column primary key and all column attributes left with their default values:
+ </p>
+
+<pre class="pre codeblock"><code>
+describe million_rows;
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| name | type | comment | primary_key | nullable | default_value | encoding | compression | block_size |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| id | string | | true | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
+| s | string | | false | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+</code></pre>
+
+ <p class="p">
+ The following example shows <code class="ph codeph">DESCRIBE</code> output for a Kudu table with a
+ two-column primary key, and Kudu-specific attributes applied to some columns:
+ </p>
+
+<pre class="pre codeblock"><code>
+create table kudu_describe_example
+(
+ c1 int, c2 int,
+ c3 string, c4 string not null, c5 string default 'n/a', c6 string default '',
+ c7 bigint not null, c8 bigint null default null, c9 bigint default -1 encoding bit_shuffle,
+ primary key(c1,c2)
+)
+partition by hash (c1, c2) partitions 10 stored as kudu;
+
+describe kudu_describe_example;
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| name | type | comment | primary_key | nullable | default_value | encoding | compression | block_size |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| c1 | int | | true | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
+| c2 | int | | true | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
+| c3 | string | | false | true | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
+| c4 | string | | false | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
+| c5 | string | | false | true | n/a | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
+| c6 | string | | false | true | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
+| c7 | bigint | | false | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
+| c8 | bigint | | false | true | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
+| c9 | bigint | | false | true | -1 | BIT_SHUFFLE | DEFAULT_COMPRESSION | 0 |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>, <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>,
+ <a class="xref" href="impala_show.html#show_tables">SHOW TABLES Statement</a>, <a class="xref" href="impala_show.html#show_create_table">SHOW CREATE TABLE Statement</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_development.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_development.html b/docs/build3x/html/topics/impala_development.html
new file mode 100644
index 0000000..5b11207
--- /dev/null
+++ b/docs/build3x/html/topics/impala_development.html
@@ -0,0 +1,197 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_concepts.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="intro_dev"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Developing Impala Applications</title></head><body id="intro_dev"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Developing Impala Applications</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The core development language with Impala is SQL. You can also use Java or other languages to interact with
+ Impala through the standard JDBC and ODBC interfaces used by many business intelligence tools. For
+ specialized kinds of analysis, you can supplement the SQL built-in functions by writing
+ <a class="xref" href="impala_udf.html#udfs">user-defined functions (UDFs)</a> in C++ or Java.
+ </p>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_concepts.html">Impala Concepts and Architecture</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="intro_dev__intro_sql">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Overview of the Impala SQL Dialect</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The Impala SQL dialect is highly compatible with the SQL syntax used in the Apache Hive component (HiveQL). As
+ such, it is familiar to users who are already familiar with running SQL queries on the Hadoop
+ infrastructure. Currently, Impala SQL supports a subset of HiveQL statements, data types, and built-in
+ functions. Impala also includes additional built-in functions for common industry features, to simplify
+ porting SQL from non-Hadoop systems.
+ </p>
+
+ <p class="p">
+ For users coming to Impala from traditional database or data warehousing backgrounds, the following aspects of the SQL dialect
+ might seem familiar:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The <a class="xref" href="impala_select.html#select">SELECT statement</a> includes familiar clauses such as <code class="ph codeph">WHERE</code>,
+ <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">ORDER BY</code>, and <code class="ph codeph">WITH</code>.
+ You will find familiar notions such as
+ <a class="xref" href="impala_joins.html#joins">joins</a>, <a class="xref" href="impala_functions.html#builtins">built-in
+ functions</a> for processing strings, numbers, and dates,
+ <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">aggregate functions</a>,
+ <a class="xref" href="impala_subqueries.html#subqueries">subqueries</a>, and
+ <a class="xref" href="impala_operators.html#comparison_operators">comparison operators</a>
+ such as <code class="ph codeph">IN()</code> and <code class="ph codeph">BETWEEN</code>.
+ The <code class="ph codeph">SELECT</code> statement is the place where SQL standards compliance is most important.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ From the data warehousing world, you will recognize the notion of
+ <a class="xref" href="impala_partitioning.html#partitioning">partitioned tables</a>.
+ One or more columns serve as partition keys, and the data is physically arranged so that
+ queries that refer to the partition key columns in the <code class="ph codeph">WHERE</code> clause
+ can skip partitions that do not match the filter conditions. For example, if you have 10
+ years worth of data and use a clause such as <code class="ph codeph">WHERE year = 2015</code>,
+ <code class="ph codeph">WHERE year > 2010</code>, or <code class="ph codeph">WHERE year IN (2014, 2015)</code>,
+ Impala skips all the data for non-matching years, greatly reducing the amount of I/O
+ for the query.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ In Impala 1.2 and higher, <a class="xref" href="impala_udf.html#udfs">UDFs</a> let you perform custom comparisons
+ and transformation logic during <code class="ph codeph">SELECT</code> and <code class="ph codeph">INSERT...SELECT</code> statements.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ For users coming to Impala from traditional database or data warehousing backgrounds, the following aspects of the SQL dialect
+ might require some learning and practice for you to become proficient in the Hadoop environment:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Impala SQL is focused on queries and includes relatively little DML. There is no <code class="ph codeph">UPDATE</code>
+ or <code class="ph codeph">DELETE</code> statement. Stale data is typically discarded (by <code class="ph codeph">DROP TABLE</code>
+ or <code class="ph codeph">ALTER TABLE ... DROP PARTITION</code> statements) or replaced (by <code class="ph codeph">INSERT
+ OVERWRITE</code> statements).
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ All data creation is done by <code class="ph codeph">INSERT</code> statements, which typically insert data in bulk by
+ querying from other tables. There are two variations, <code class="ph codeph">INSERT INTO</code> which appends to the
+ existing data, and <code class="ph codeph">INSERT OVERWRITE</code> which replaces the entire contents of a table or
+ partition (similar to <code class="ph codeph">TRUNCATE TABLE</code> followed by a new <code class="ph codeph">INSERT</code>).
+ Although there is an <code class="ph codeph">INSERT ... VALUES</code> syntax to create a small number of values in
+ a single statement, it is far more efficient to use the <code class="ph codeph">INSERT ... SELECT</code> to copy
+ and transform large amounts of data from one table to another in a single operation.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ You often construct Impala table definitions and data files in some other environment, and then attach
+ Impala so that it can run real-time queries. The same data files and table metadata are shared with other
+ components of the Hadoop ecosystem. In particular, Impala can access tables created by Hive or data
+ inserted by Hive, and Hive can access tables and data produced by Impala. Many other Hadoop components
+ can write files in formats such as Parquet and Avro, that can then be queried by Impala.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Because Hadoop and Impala are focused on data warehouse-style operations on large data sets, Impala SQL
+ includes some idioms that you might find in the import utilities for traditional database systems. For
+ example, you can create a table that reads comma-separated or tab-separated text files, specifying the
+ separator in the <code class="ph codeph">CREATE TABLE</code> statement. You can create <strong class="ph b">external tables</strong> that read
+ existing data files but do not move or transform them.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Because Impala reads large quantities of data that might not be perfectly tidy and predictable, it does
+ not require length constraints on string data types. For example, you can define a database column as
+ <code class="ph codeph">STRING</code> with unlimited length, rather than <code class="ph codeph">CHAR(1)</code> or
+ <code class="ph codeph">VARCHAR(64)</code>. <span class="ph">(Although in Impala 2.0 and later, you can also use
+ length-constrained <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code> types.)</span>
+ </p>
+ </li>
+
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong> <a class="xref" href="impala_langref.html#langref">Impala SQL Language Reference</a>, especially
+ <a class="xref" href="impala_langref_sql.html#langref_sql">Impala SQL Statements</a> and <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a>
+ </p>
+ </div>
+ </article>
+
+
+
+
+
+
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="intro_dev__intro_apis">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Overview of Impala Programming Interfaces</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ You can connect and submit requests to the Impala daemons through:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The <code class="ph codeph"><a class="xref" href="impala_impala_shell.html#impala_shell">impala-shell</a></code> interactive
+ command interpreter.
+ </li>
+
+ <li class="li">
+ The <a class="xref" href="http://gethue.com/" target="_blank">Hue</a> web-based user interface.
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_jdbc.html#impala_jdbc">JDBC</a>.
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_odbc.html#impala_odbc">ODBC</a>.
+ </li>
+ </ul>
+
+ <p class="p">
+ With these options, you can use Impala in heterogeneous environments, with JDBC or ODBC applications
+ running on non-Linux platforms. You can also use Impala on combination with various Business Intelligence
+ tools that use the JDBC and ODBC interfaces.
+ </p>
+
+ <p class="p">
+ Each <code class="ph codeph">impalad</code> daemon process, running on separate nodes in a cluster, listens to
+ <a class="xref" href="impala_ports.html#ports">several ports</a> for incoming requests. Requests from
+ <code class="ph codeph">impala-shell</code> and Hue are routed to the <code class="ph codeph">impalad</code> daemons through the same
+ port. The <code class="ph codeph">impalad</code> daemons listen on separate ports for JDBC and ODBC requests.
+ </p>
+ </div>
+ </article>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_disable_codegen.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_disable_codegen.html b/docs/build3x/html/topics/impala_disable_codegen.html
new file mode 100644
index 0000000..3fae1e7
--- /dev/null
+++ b/docs/build3x/html/topics/impala_disable_codegen.html
@@ -0,0 +1,36 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disable_codegen"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DISABLE_CODEGEN Query Option</title></head><body id="disable_codegen"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DISABLE_CODEGEN Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ This is a debug option, intended for diagnosing and working around issues that cause crashes. If a query
+ fails with an <span class="q">"illegal instruction"</span> or other hardware-specific message, try setting
+ <code class="ph codeph">DISABLE_CODEGEN=true</code> and running the query again. If the query succeeds only when the
+ <code class="ph codeph">DISABLE_CODEGEN</code> option is turned on, submit the problem to <span class="keyword">the appropriate support channel</span> and include that
+ detail in the problem report. Do not otherwise run with this setting turned on, because it results in lower
+ overall performance.
+ </p>
+
+ <p class="p">
+ Because the code generation phase adds a small amount of overhead for each query, you might turn on the
+ <code class="ph codeph">DISABLE_CODEGEN</code> option to achieve maximum throughput when running many short-lived queries
+ against small tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+ any other value interpreted as <code class="ph codeph">false</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_disable_row_runtime_filtering.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_disable_row_runtime_filtering.html b/docs/build3x/html/topics/impala_disable_row_runtime_filtering.html
new file mode 100644
index 0000000..80d84f5
--- /dev/null
+++ b/docs/build3x/html/topics/impala_disable_row_runtime_filtering.html
@@ -0,0 +1,90 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disable_row_runtime_filtering"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DISABLE_ROW_RUNTIME_FILTERING Query Option (Impala 2.5 or higher only)</title></head><body id="disable_row_runtime_filtering"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DISABLE_ROW_RUNTIME_FILTERING Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The <code class="ph codeph">DISABLE_ROW_RUNTIME_FILTERING</code> query option
+ reduces the scope of the runtime filtering feature. Queries still dynamically prune
+ partitions, but do not apply the filtering logic to individual rows within partitions.
+ </p>
+
+ <p class="p">
+ Only applies to queries against Parquet tables. For other file formats, Impala
+ only prunes at the level of partitions, not individual rows.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+ any other value interpreted as <code class="ph codeph">false</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">false</code>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Impala automatically evaluates whether the per-row filters are being
+ effective at reducing the amount of intermediate data. Therefore,
+ this option is typically only needed for the rare case where Impala
+ cannot accurately determine how effective the per-row filtering is
+ for a query.
+ </p>
+
+ <p class="p">
+ Because the runtime filtering feature applies mainly to resource-intensive
+ and long-running queries, only adjust this query option when tuning long-running queries
+ involving some combination of large partitioned tables and joins involving large tables.
+ </p>
+
+ <p class="p">
+ Because this setting only improves query performance in very specific
+ circumstances, depending on the query characteristics and data distribution,
+ only use it when you determine through benchmarking that it improves
+ performance of specific expensive queries.
+ Consider setting this query option immediately before the expensive query and
+ unsetting it immediately afterward.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">File format considerations:</strong>
+ </p>
+
+ <p class="p">
+ This query option only applies to queries against HDFS-based tables
+ using the Parquet file format.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+
+ <p class="p">
+ When applied to a query involving a Kudu table, this option turns off
+ all runtime filtering for the Kudu table.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>,
+ <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>
+
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_disable_streaming_preaggregations.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_disable_streaming_preaggregations.html b/docs/build3x/html/topics/impala_disable_streaming_preaggregations.html
new file mode 100644
index 0000000..bf1f9bc
--- /dev/null
+++ b/docs/build3x/html/topics/impala_disable_streaming_preaggregations.html
@@ -0,0 +1,50 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disable_streaming_preaggregations"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DISABLE_STREAMING_PREAGGREGATIONS Query Option (Impala 2.5 or higher only)</title></head><body id="disable_streaming_preaggregations"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DISABLE_STREAMING_PREAGGREGATIONS Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Turns off the <span class="q">"streaming preaggregation"</span> optimization that is available in <span class="keyword">Impala 2.5</span>
+ and higher. This optimization reduces unnecessary work performed by queries that perform aggregation
+ operations on columns with few or no duplicate values, for example <code class="ph codeph">DISTINCT <var class="keyword varname">id_column</var></code>
+ or <code class="ph codeph">GROUP BY <var class="keyword varname">unique_column</var></code>. If the optimization causes regressions in
+ existing queries that use aggregation functions, you can turn it off as needed by setting this query option.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+ any other value interpreted as <code class="ph codeph">false</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ In <span class="keyword">Impala 2.5.0</span>, only the value 1 enables the option, and the value
+ <code class="ph codeph">true</code> is not recognized. This limitation is
+ tracked by the issue
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3334" target="_blank">IMPALA-3334</a>,
+ which shows the releases where the problem is fixed.
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ Typically, queries that would require enabling this option involve very large numbers of
+ aggregated values, such as a billion or more distinct keys being processed on each
+ worker node.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_disable_unsafe_spills.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_disable_unsafe_spills.html b/docs/build3x/html/topics/impala_disable_unsafe_spills.html
new file mode 100644
index 0000000..63f1c1b
--- /dev/null
+++ b/docs/build3x/html/topics/impala_disable_unsafe_spills.html
@@ -0,0 +1,50 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disable_unsafe_spills"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DISABLE_UNSAFE_SPILLS Query Option (Impala 2.0 or higher only)</title></head><body id="disable_unsafe_spills"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DISABLE_UNSAFE_SPILLS Query Option (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Enable this option if you prefer to have queries fail when they exceed the Impala memory limit, rather than
+ write temporary data to disk.
+ </p>
+
+ <p class="p">
+ Queries that <span class="q">"spill"</span> to disk typically complete successfully, when in earlier Impala releases they would have failed.
+ However, queries with exorbitant memory requirements due to missing statistics or inefficient join clauses could
+ become so slow as a result that you would rather have them cancelled automatically and reduce the memory
+ usage through standard Impala tuning techniques.
+ </p>
+
+ <p class="p">
+ This option prevents only <span class="q">"unsafe"</span> spill operations, meaning that one or more tables are missing
+ statistics or the query does not include a hint to set the most efficient mechanism for a join or
+ <code class="ph codeph">INSERT ... SELECT</code> into a partitioned table. These are the tables most likely to result in
+ suboptimal execution plans that could cause unnecessary spilling. Therefore, leaving this option enabled is a
+ good way to find tables on which to run the <code class="ph codeph">COMPUTE STATS</code> statement.
+ </p>
+
+ <p class="p">
+ See <a class="xref" href="impala_scalability.html#spill_to_disk">SQL Operations that Spill to Disk</a> for information about the <span class="q">"spill to disk"</span>
+ feature for queries processing large result sets with joins, <code class="ph codeph">ORDER BY</code>, <code class="ph codeph">GROUP
+ BY</code>, <code class="ph codeph">DISTINCT</code>, aggregation functions, or analytic functions.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+ any other value interpreted as <code class="ph codeph">false</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_disk_space.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_disk_space.html b/docs/build3x/html/topics/impala_disk_space.html
new file mode 100644
index 0000000..560be2b
--- /dev/null
+++ b/docs/build3x/html/topics/impala_disk_space.html
@@ -0,0 +1,133 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admin.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disk_space"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Managing Disk Space for Impala Data</title></head><body id="disk_space"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Managing Disk Space for Impala Data</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Although Impala typically works with many large files in an HDFS storage system with plenty of capacity,
+ there are times when you might perform some file cleanup to reclaim space, or advise developers on techniques
+ to minimize space consumption and file duplication.
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Use compact binary file formats where practical. Numeric and time-based data in particular can be stored
+ in more compact form in binary data files. Depending on the file format, various compression and encoding
+ features can reduce file size even further. You can specify the <code class="ph codeph">STORED AS</code> clause as part
+ of the <code class="ph codeph">CREATE TABLE</code> statement, or <code class="ph codeph">ALTER TABLE</code> with the <code class="ph codeph">SET
+ FILEFORMAT</code> clause for an existing table or partition within a partitioned table. See
+ <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details about file formats, especially
+ <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>. See <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> and
+ <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> for syntax details.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ You manage underlying data files differently depending on whether the corresponding Impala table is
+ defined as an <a class="xref" href="impala_tables.html#internal_tables">internal</a> or
+ <a class="xref" href="impala_tables.html#external_tables">external</a> table:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ Use the <code class="ph codeph">DESCRIBE FORMATTED</code> statement to check if a particular table is internal
+ (managed by Impala) or external, and to see the physical location of the data files in HDFS. See
+ <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a> for details.
+ </li>
+
+ <li class="li">
+ For Impala-managed (<span class="q">"internal"</span>) tables, use <code class="ph codeph">DROP TABLE</code> statements to remove
+ data files. See <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a> for details.
+ </li>
+
+ <li class="li">
+ For tables not managed by Impala (<span class="q">"external"</span> tables), use appropriate HDFS-related commands such
+ as <code class="ph codeph">hadoop fs</code>, <code class="ph codeph">hdfs dfs</code>, or <code class="ph codeph">distcp</code>, to create, move,
+ copy, or delete files within HDFS directories that are accessible by the <code class="ph codeph">impala</code> user.
+ Issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> statement after adding or removing any
+ files from the data directory of an external table. See <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a> for
+ details.
+ </li>
+
+ <li class="li">
+ Use external tables to reference HDFS data files in their original location. With this technique, you
+ avoid copying the files, and you can map more than one Impala table to the same set of data files. When
+ you drop the Impala table, the data files are left undisturbed. See
+ <a class="xref" href="impala_tables.html#external_tables">External Tables</a> for details.
+ </li>
+
+ <li class="li">
+ Use the <code class="ph codeph">LOAD DATA</code> statement to move HDFS files into the data directory for an Impala
+ table from inside Impala, without the need to specify the HDFS path of the destination directory. This
+ technique works for both internal and external tables. See
+ <a class="xref" href="impala_load_data.html#load_data">LOAD DATA Statement</a> for details.
+ </li>
+ </ul>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Make sure that the HDFS trashcan is configured correctly. When you remove files from HDFS, the space
+ might not be reclaimed for use by other files until sometime later, when the trashcan is emptied. See
+ <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a> for details. See
+ <a class="xref" href="impala_prereqs.html#prereqs_account">User Account Requirements</a> for permissions needed for the HDFS trashcan to operate
+ correctly.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Drop all tables in a database before dropping the database itself. See
+ <a class="xref" href="impala_drop_database.html#drop_database">DROP DATABASE Statement</a> for details.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Clean up temporary files after failed <code class="ph codeph">INSERT</code> statements. If an <code class="ph codeph">INSERT</code>
+ statement encounters an error, and you see a directory named <span class="ph filepath">.impala_insert_staging</span>
+ or <span class="ph filepath">_impala_insert_staging</span> left behind in the data directory for the table, it might
+ contain temporary data files taking up space in HDFS. You might be able to salvage these data files, for
+ example if they are complete but could not be moved into place due to a permission error. Or, you might
+ delete those files through commands such as <code class="ph codeph">hadoop fs</code> or <code class="ph codeph">hdfs dfs</code>, to
+ reclaim space before re-trying the <code class="ph codeph">INSERT</code>. Issue <code class="ph codeph">DESCRIBE FORMATTED
+ <var class="keyword varname">table_name</var></code> to see the HDFS path where you can check for temporary files.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ By default, intermediate files used during large sort, join, aggregation, or analytic function operations
+ are stored in the directory <span class="ph filepath">/tmp/impala-scratch</span> . These files are removed when the
+ operation finishes. (Multiple concurrent queries can perform operations that use the <span class="q">"spill to disk"</span>
+ technique, without any name conflicts for these temporary files.) You can specify a different location by
+ starting the <span class="keyword cmdname">impalad</span> daemon with the
+ <code class="ph codeph">--scratch_dirs="<var class="keyword varname">path_to_directory</var>"</code> configuration option.
+ You can specify a single directory, or a comma-separated list of directories. The scratch directories must
+ be on the local filesystem, not in HDFS. You might specify different directory paths for different hosts,
+ depending on the capacity and speed
+ of the available storage devices. In <span class="keyword">Impala 2.3</span> or higher, Impala successfully starts (with a warning
+ Impala successfully starts (with a warning written to the log) if it cannot create or read and write files
+ in one of the scratch directories. If there is less than 1 GB free on the filesystem where that directory resides,
+ Impala still runs, but writes a warning message to its log. If Impala encounters an error reading or writing
+ files in a scratch directory during a query, Impala logs the error and the query fails.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If you use the Amazon Simple Storage Service (S3) as a place to offload
+ data to reduce the volume of local storage, Impala 2.2.0 and higher
+ can query the data directly from S3.
+ See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details.
+ </p>
+ </li>
+ </ul>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_admin.html">Impala Administration</a></div></div></nav></article></main></body></html>