You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by jb...@apache.org on 2017/04/12 18:25:28 UTC
[24/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS]
Publish rendered Impala documentation to ASF site
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_live_summary.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_live_summary.html b/docs/build/html/topics/impala_live_summary.html
new file mode 100644
index 0000000..cb41693
--- /dev/null
+++ b/docs/build/html/topics/impala_live_summary.html
@@ -0,0 +1,177 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="live_summary"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>LIVE_SUMMARY Query Option (Impala 2.3 or higher only)</title></head><body id="live_summary"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">LIVE_SUMMARY Query Option (<span class="keyword">Impala 2.3</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ For queries submitted through the <span class="keyword cmdname">impala-shell</span> command,
+ displays the same output as the <code class="ph codeph">SUMMARY</code> command,
+ with the measurements updated in real time as the query progresses.
+ When the query finishes, the final <code class="ph codeph">SUMMARY</code> output remains
+ visible in the <span class="keyword cmdname">impala-shell</span> console output.
+ </p>
+
+ <p class="p">
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+ any other value interpreted as <code class="ph codeph">false</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Command-line equivalent:</strong>
+ </p>
+ <p class="p">
+ You can enable this query option within <span class="keyword cmdname">impala-shell</span>
+ by starting the shell with the <code class="ph codeph">--live_summary</code>
+ command-line option.
+ You can still turn this setting off and on again within the shell through the
+ <code class="ph codeph">SET</code> command.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ The live summary output can be useful for evaluating long-running queries,
+ to evaluate which phase of execution takes up the most time, or if some hosts
+ take much longer than others for certain operations, dragging overall performance down.
+ By making the information available in real time, this feature lets you decide what
+ action to take even before you cancel a query that is taking much longer than normal.
+ </p>
+ <p class="p">
+ For example, you might see the HDFS scan phase taking a long time, and therefore revisit
+ performance-related aspects of your schema design such as constructing a partitioned table,
+ switching to the Parquet file format, running the <code class="ph codeph">COMPUTE STATS</code> statement
+ for the table, and so on.
+ Or you might see a wide variation between the average and maximum times for all hosts to
+ perform some phase of the query, and therefore investigate if one particular host
+ needed more memory or was experiencing a network problem.
+ </p>
+ <p class="p">
+ The output from this query option is printed to standard error. The output is only displayed in interactive mode,
+ that is, not when the <code class="ph codeph">-q</code> or <code class="ph codeph">-f</code> options are used.
+ </p>
+ <p class="p">
+ For a simple and concise way of tracking the progress of an interactive query, see
+ <a class="xref" href="impala_live_progress.html#live_progress">LIVE_PROGRESS Query Option (Impala 2.3 or higher only)</a>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+ <p class="p">
+ The <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code> query options
+ currently do not produce any output during <code class="ph codeph">COMPUTE STATS</code> operations.
+ </p>
+ <div class="p">
+ Because the <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code> query options
+ are available only within the <span class="keyword cmdname">impala-shell</span> interpreter:
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ You cannot change these query options through the SQL <code class="ph codeph">SET</code>
+ statement using the JDBC or ODBC interfaces. The <code class="ph codeph">SET</code>
+ command in <span class="keyword cmdname">impala-shell</span> recognizes these names as
+ shell-only options.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Be careful when using <span class="keyword cmdname">impala-shell</span> on a pre-<span class="keyword">Impala 2.3</span>
+ system to connect to a system running <span class="keyword">Impala 2.3</span> or higher.
+ The older <span class="keyword cmdname">impala-shell</span> does not recognize these
+ query option names. Upgrade <span class="keyword cmdname">impala-shell</span> on the
+ systems where you intend to use these query options.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Likewise, the <span class="keyword cmdname">impala-shell</span> command relies on
+ some information only available in <span class="keyword">Impala 2.3</span> and higher
+ to prepare live progress reports and query summaries. The
+ <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code>
+ query options have no effect when <span class="keyword cmdname">impala-shell</span> connects
+ to a cluster running an older version of Impala.
+ </p>
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example shows a series of <code class="ph codeph">LIVE_SUMMARY</code> reports that
+ are displayed during the course of a query, showing how the numbers increase to
+ show the progress of different phases of the distributed query. When you do the same
+ in <span class="keyword cmdname">impala-shell</span>, only a single report is displayed at any one time,
+ with each update overwriting the previous numbers.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > set live_summary=true;
+LIVE_SUMMARY set to true
+[localhost:21000] > select count(*) from customer t1 cross join customer t2;
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | FINALIZE |
+| 05:EXCHANGE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | UNPARTITIONED |
+| 03:AGGREGATE | 0 | 0ns | 0ns | 0 | 1 | 0 B | 10.00 MB | |
+| 02:NESTED LOOP JOIN | 0 | 0ns | 0ns | 0 | 22.50B | 0 B | 0 B | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE | 0 | 0ns | 0ns | 0 | 150.00K | 0 B | 0 B | BROADCAST |
+| | 01:SCAN HDFS | 1 | 503.57ms | 503.57ms | 150.00K | 150.00K | 24.09 MB | 64.00 MB | tpch.customer t2 |
+| 00:SCAN HDFS | 0 | 0ns | 0ns | 0 | 150.00K | 0 B | 64.00 MB | tpch.customer t1 |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | FINALIZE |
+| 05:EXCHANGE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | UNPARTITIONED |
+| 03:AGGREGATE | 1 | 0ns | 0ns | 0 | 1 | 20.00 KB | 10.00 MB | |
+| 02:NESTED LOOP JOIN | 1 | 17.62s | 17.62s | 81.14M | 22.50B | 3.23 MB | 0 B | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE | 1 | 26.29ms | 26.29ms | 150.00K | 150.00K | 0 B | 0 B | BROADCAST |
+| | 01:SCAN HDFS | 1 | 503.57ms | 503.57ms | 150.00K | 150.00K | 24.09 MB | 64.00 MB | tpch.customer t2 |
+| 00:SCAN HDFS | 1 | 247.53ms | 247.53ms | 1.02K | 150.00K | 24.39 MB | 64.00 MB | tpch.customer t1 |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | FINALIZE |
+| 05:EXCHANGE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | UNPARTITIONED |
+| 03:AGGREGATE | 1 | 0ns | 0ns | 0 | 1 | 20.00 KB | 10.00 MB | |
+| 02:NESTED LOOP JOIN | 1 | 61.85s | 61.85s | 283.43M | 22.50B | 3.23 MB | 0 B | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE | 1 | 26.29ms | 26.29ms | 150.00K | 150.00K | 0 B | 0 B | BROADCAST |
+| | 01:SCAN HDFS | 1 | 503.57ms | 503.57ms | 150.00K | 150.00K | 24.09 MB | 64.00 MB | tpch.customer t2 |
+| 00:SCAN HDFS | 1 | 247.59ms | 247.59ms | 2.05K | 150.00K | 24.39 MB | 64.00 MB | tpch.customer t1 |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+
+</code></pre>
+
+
+
+
+ <p class="p">
+ To see how the <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code> query options
+ work in real time, see <a class="xref" href="https://asciinema.org/a/1rv7qippo0fe7h5k1b6k4nexk" target="_blank">this animated demo</a>.
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_load_data.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_load_data.html b/docs/build/html/topics/impala_load_data.html
new file mode 100644
index 0000000..e49408b
--- /dev/null
+++ b/docs/build/html/topics/impala_load_data.html
@@ -0,0 +1,306 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="load_data"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>LOAD DATA Statement</title></head><body id="load_data"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">LOAD DATA Statement</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The <code class="ph codeph">LOAD DATA</code> statement streamlines the ETL process for an internal Impala table by moving a
+ data file or all the data files in a directory from an HDFS location into the Impala data directory for that
+ table.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>LOAD DATA INPATH '<var class="keyword varname">hdfs_file_or_directory_path</var>' [OVERWRITE] INTO TABLE <var class="keyword varname">tablename</var>
+ [PARTITION (<var class="keyword varname">partcol1</var>=<var class="keyword varname">val1</var>, <var class="keyword varname">partcol2</var>=<var class="keyword varname">val2</var> ...)]</code></pre>
+
+ <p class="p">
+ When the <code class="ph codeph">LOAD DATA</code> statement operates on a partitioned table,
+ it always operates on one partition at a time. Specify the <code class="ph codeph">PARTITION</code> clauses
+ and list all the partition key columns, with a constant value specified for each.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Statement type:</strong> DML (but still affected by
+ <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL</a> query option)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The loaded data files are moved, not copied, into the Impala data directory.
+ </li>
+
+ <li class="li">
+ You can specify the HDFS path of a single file to be moved, or the HDFS path of a directory to move all the
+ files inside that directory. You cannot specify any sort of wildcard to take only some of the files from a
+ directory. When loading a directory full of data files, keep all the data files at the top level, with no
+ nested directories underneath.
+ </li>
+
+ <li class="li">
+ Currently, the Impala <code class="ph codeph">LOAD DATA</code> statement only imports files from HDFS, not from the local
+ filesystem. It does not support the <code class="ph codeph">LOCAL</code> keyword of the Hive <code class="ph codeph">LOAD DATA</code>
+ statement. You must specify a path, not an <code class="ph codeph">hdfs://</code> URI.
+ </li>
+
+ <li class="li">
+ In the interest of speed, only limited error checking is done. If the loaded files have the wrong file
+ format, different columns than the destination table, or other kind of mismatch, Impala does not raise any
+ error for the <code class="ph codeph">LOAD DATA</code> statement. Querying the table afterward could produce a runtime
+ error or unexpected results. Currently, the only checking the <code class="ph codeph">LOAD DATA</code> statement does is
+ to avoid mixing together uncompressed and LZO-compressed text files in the same table.
+ </li>
+
+ <li class="li">
+ When you specify an HDFS directory name as the <code class="ph codeph">LOAD DATA</code> argument, any hidden files in
+ that directory (files whose names start with a <code class="ph codeph">.</code>) are not moved to the Impala data
+ directory.
+ </li>
+
+ <li class="li">
+ The operation fails if the source directory contains any non-hidden directories.
+ Prior to <span class="keyword">Impala 2.5</span> if the source directory contained any subdirectory, even a hidden one such as
+ <span class="ph filepath">_impala_insert_staging</span>, the <code class="ph codeph">LOAD DATA</code> statement would fail.
+ In <span class="keyword">Impala 2.5</span> and higher, <code class="ph codeph">LOAD DATA</code> ignores hidden subdirectories in the
+ source directory, and only fails if any of the subdirectories are non-hidden.
+ </li>
+
+ <li class="li">
+ The loaded data files retain their original names in the new location, unless a name conflicts with an
+ existing data file, in which case the name of the new file is modified slightly to be unique. (The
+ name-mangling is a slight difference from the Hive <code class="ph codeph">LOAD DATA</code> statement, which replaces
+ identically named files.)
+ </li>
+
+ <li class="li">
+ By providing an easy way to transport files from known locations in HDFS into the Impala data directory
+ structure, the <code class="ph codeph">LOAD DATA</code> statement lets you avoid memorizing the locations and layout of
+ HDFS directory tree containing the Impala databases and tables. (For a quick way to check the location of
+ the data files for an Impala table, issue the statement <code class="ph codeph">DESCRIBE FORMATTED
+ <var class="keyword varname">table_name</var></code>.)
+ </li>
+
+ <li class="li">
+ The <code class="ph codeph">PARTITION</code> clause is especially convenient for ingesting new data for a partitioned
+ table. As you receive new data for a time period, geographic region, or other division that corresponds to
+ one or more partitioning columns, you can load that data straight into the appropriate Impala data
+ directory, which might be nested several levels down if the table is partitioned by multiple columns. When
+ the table is partitioned, you must specify constant values for all the partitioning columns.
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ Because Impala currently cannot create Parquet data files containing complex types
+ (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>), the
+ <code class="ph codeph">LOAD DATA</code> statement is especially important when working with
+ tables containing complex type columns. You create the Parquet data files outside
+ Impala, then use either <code class="ph codeph">LOAD DATA</code>, an external table, or HDFS-level
+ file operations followed by <code class="ph codeph">REFRESH</code> to associate the data files with
+ the corresponding table.
+ See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types.
+ </p>
+
+ <p class="p">
+ If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+ load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+ statement wait before returning, until the new or changed metadata has been received by all the Impala
+ nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+ </p>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ After adding or replacing data in a table used in performance-critical queries, issue a <code class="ph codeph">COMPUTE
+ STATS</code> statement to make sure all statistics are up-to-date. Consider updating statistics for a
+ table after any <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS
+ SELECT</code> statement in Impala, or after loading data through Hive and doing a <code class="ph codeph">REFRESH
+ <var class="keyword varname">table_name</var></code> in Impala. This technique is especially important for tables that
+ are very large, used in join queries, or both.
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ First, we use a trivial Python script to write different numbers of strings (one per line) into files stored
+ in the <code class="ph codeph">doc_demo</code> HDFS user account. (Substitute the path for your own HDFS user account when
+ doing <span class="keyword cmdname">hdfs dfs</span> operations like these.)
+ </p>
+
+<pre class="pre codeblock"><code>$ random_strings.py 1000 | hdfs dfs -put - /user/doc_demo/thousand_strings.txt
+$ random_strings.py 100 | hdfs dfs -put - /user/doc_demo/hundred_strings.txt
+$ random_strings.py 10 | hdfs dfs -put - /user/doc_demo/ten_strings.txt</code></pre>
+
+ <p class="p">
+ Next, we create a table and load an initial set of data into it. Remember, unless you specify a
+ <code class="ph codeph">STORED AS</code> clause, Impala tables default to <code class="ph codeph">TEXTFILE</code> format with Ctrl-A (hex
+ 01) as the field delimiter. This example uses a single-column table, so the delimiter is not significant. For
+ large-scale ETL jobs, you would typically use binary format data files such as Parquet or Avro, and load them
+ into Impala tables that use the corresponding file format.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table t1 (s string);
+[localhost:21000] > load data inpath '/user/doc_demo/thousand_strings.txt' into table t1;
+Query finished, fetching results ...
++----------------------------------------------------------+
+| summary |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 1 |
++----------------------------------------------------------+
+Returned 1 row(s) in 0.61s
+[kilo2-202-961.cs1cloud.internal:21000] > select count(*) from t1;
+Query finished, fetching results ...
++------+
+| _c0 |
++------+
+| 1000 |
++------+
+Returned 1 row(s) in 0.67s
+[localhost:21000] > load data inpath '/user/doc_demo/thousand_strings.txt' into table t1;
+ERROR: AnalysisException: INPATH location '/user/doc_demo/thousand_strings.txt' does not exist. </code></pre>
+
+ <p class="p">
+ As indicated by the message at the end of the previous example, the data file was moved from its original
+ location. The following example illustrates how the data file was moved into the Impala data directory for
+ the destination table, keeping its original filename:
+ </p>
+
+<pre class="pre codeblock"><code>$ hdfs dfs -ls /user/hive/warehouse/load_data_testing.db/t1
+Found 1 items
+-rw-r--r-- 1 doc_demo doc_demo 13926 2013-06-26 15:40 /user/hive/warehouse/load_data_testing.db/t1/thousand_strings.txt</code></pre>
+
+ <p class="p">
+ The following example demonstrates the difference between the <code class="ph codeph">INTO TABLE</code> and
+ <code class="ph codeph">OVERWRITE TABLE</code> clauses. The table already contains 1000 rows. After issuing the
+ <code class="ph codeph">LOAD DATA</code> statement with the <code class="ph codeph">INTO TABLE</code> clause, the table contains 100 more
+ rows, for a total of 1100. After issuing the <code class="ph codeph">LOAD DATA</code> statement with the <code class="ph codeph">OVERWRITE
+ INTO TABLE</code> clause, the former contents are gone, and now the table only contains the 10 rows from
+ the just-loaded data file.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > load data inpath '/user/doc_demo/hundred_strings.txt' into table t1;
+Query finished, fetching results ...
++----------------------------------------------------------+
+| summary |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 2 |
++----------------------------------------------------------+
+Returned 1 row(s) in 0.24s
+[localhost:21000] > select count(*) from t1;
+Query finished, fetching results ...
++------+
+| _c0 |
++------+
+| 1100 |
++------+
+Returned 1 row(s) in 0.55s
+[localhost:21000] > load data inpath '/user/doc_demo/ten_strings.txt' overwrite into table t1;
+Query finished, fetching results ...
++----------------------------------------------------------+
+| summary |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 1 |
++----------------------------------------------------------+
+Returned 1 row(s) in 0.26s
+[localhost:21000] > select count(*) from t1;
+Query finished, fetching results ...
++-----+
+| _c0 |
++-----+
+| 10 |
++-----+
+Returned 1 row(s) in 0.62s</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Amazon S3 considerations:</strong>
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, the Impala DML statements (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>,
+ and <code class="ph codeph">CREATE TABLE AS SELECT</code>) can write data into a table or partition that resides in the
+ Amazon Simple Storage Service (S3).
+ The syntax of the DML statements is the same as for any other tables, because the S3 location for tables and
+ partitions is specified by an <code class="ph codeph">s3a://</code> prefix in the
+ <code class="ph codeph">LOCATION</code> attribute of
+ <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements.
+ If you bring data into S3 using the normal S3 transfer mechanisms instead of Impala DML statements,
+ issue a <code class="ph codeph">REFRESH</code> statement for the table before using Impala to query the S3 data.
+ </p>
+ <p class="p">
+ Because of differences between S3 and traditional filesystems, DML operations
+ for S3 tables can take longer than for tables on HDFS. For example, both the
+ <code class="ph codeph">LOAD DATA</code> statement and the final stage of the <code class="ph codeph">INSERT</code>
+ and <code class="ph codeph">CREATE TABLE AS SELECT</code> statements involve moving files from one directory
+ to another. (In the case of <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code>,
+ the files are moved from a temporary staging directory to the final destination directory.)
+ Because S3 does not support a <span class="q">"rename"</span> operation for existing objects, in these cases Impala
+ actually copies the data files from one location to another and then removes the original files.
+ In <span class="keyword">Impala 2.6</span>, the <code class="ph codeph">S3_SKIP_INSERT_STAGING</code> query option provides a way
+ to speed up <code class="ph codeph">INSERT</code> statements for S3 tables and partitions, with the tradeoff
+ that a problem during statement execution could leave data in an inconsistent state.
+ It does not apply to <code class="ph codeph">INSERT OVERWRITE</code> or <code class="ph codeph">LOAD DATA</code> statements.
+ See <a class="xref" href="../shared/../topics/impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a> for details.
+ </p>
+ <p class="p">See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about reading and writing S3 data with Impala.</p>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong>
+ </p>
+ <p class="p">
+ The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+ typically the <code class="ph codeph">impala</code> user, must have read and write
+ permissions for the files in the source directory, and write
+ permission for the destination directory.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+ <p class="p">
+ The <code class="ph codeph">LOAD DATA</code> statement cannot be used with Kudu tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HBase considerations:</strong>
+ </p>
+ <p class="p">
+ The <code class="ph codeph">LOAD DATA</code> statement cannot be used with HBase tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ The <code class="ph codeph">LOAD DATA</code> statement is an alternative to the
+ <code class="ph codeph"><a class="xref" href="impala_insert.html#insert">INSERT</a></code> statement.
+ Use <code class="ph codeph">LOAD DATA</code>
+ when you have the data files in HDFS but outside of any Impala table.
+ </p>
+ <p class="p">
+ The <code class="ph codeph">LOAD DATA</code> statement is also an alternative
+ to the <code class="ph codeph">CREATE EXTERNAL TABLE</code> statement. Use
+ <code class="ph codeph">LOAD DATA</code> when it is appropriate to move the
+ data files under Impala control rather than querying them
+ from their original location. See <a class="xref" href="impala_tables.html#external_tables">External Tables</a>
+ for information about working with external tables.
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_logging.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_logging.html b/docs/build/html/topics/impala_logging.html
new file mode 100644
index 0000000..049f370
--- /dev/null
+++ b/docs/build/html/topics/impala_logging.html
@@ -0,0 +1,416 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impa
la 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="logging"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala Logging</title></head><body id="logging"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Using Impala Logging</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The Impala logs record information about:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Any errors Impala encountered. If Impala experienced a serious error during startup, you must diagnose and
+ troubleshoot that problem before you can do anything further with Impala.
+ </li>
+
+ <li class="li">
+ How Impala is configured.
+ </li>
+
+ <li class="li">
+ Jobs Impala has completed.
+ </li>
+ </ul>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ Formerly, the logs contained the query profile for each query, showing low-level details of how the work is
+ distributed among nodes and how intermediate and final results are transmitted across the network. To save
+ space, those query profiles are now stored in zlib-compressed files in
+ <span class="ph filepath">/var/log/impala/profiles</span>. You can access them through the Impala web user interface.
+ For example, at <code class="ph codeph">http://<var class="keyword varname">impalad-node-hostname</var>:25000/queries</code>, each query
+ is followed by a <code class="ph codeph">Profile</code> link leading to a page showing extensive analytical data for the
+ query execution.
+ </p>
+
+ <p class="p">
+ The auditing feature introduced in Impala 1.1.1 produces a separate set of audit log files when
+ enabled. See <a class="xref" href="impala_auditing.html#auditing">Auditing Impala Operations</a> for details.
+ </p>
+
+ <p class="p">
+ The lineage feature introduced in Impala 2.2.0 produces a separate lineage log file when
+ enabled. See <a class="xref" href="impala_lineage.html#lineage">Viewing Lineage Information for Impala Data</a> for details.
+ </p>
+ </div>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="logging__logs_details">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Locations and Names of Impala Log Files</h2>
+
+ <div class="body conbody">
+
+ <ul class="ul">
+ <li class="li">
+ By default, the log files are under the directory <span class="ph filepath">/var/log/impala</span>.
+ To change log file locations, modify the defaults file described in
+ <a class="xref" href="impala_processes.html#processes">Starting Impala</a>.
+ </li>
+
+ <li class="li">
+ The significant files for the <code class="ph codeph">impalad</code> process are <span class="ph filepath">impalad.INFO</span>,
+ <span class="ph filepath">impalad.WARNING</span>, and <span class="ph filepath">impalad.ERROR</span>. You might also see a file
+ <span class="ph filepath">impalad.FATAL</span>, although this is only present in rare conditions.
+ </li>
+
+ <li class="li">
+ The significant files for the <code class="ph codeph">statestored</code> process are
+ <span class="ph filepath">statestored.INFO</span>, <span class="ph filepath">statestored.WARNING</span>, and
+ <span class="ph filepath">statestored.ERROR</span>. You might also see a file <span class="ph filepath">statestored.FATAL</span>,
+ although this is only present in rare conditions.
+ </li>
+
+ <li class="li">
+ The significant files for the <code class="ph codeph">catalogd</code> process are <span class="ph filepath">catalogd.INFO</span>,
+ <span class="ph filepath">catalogd.WARNING</span>, and <span class="ph filepath">catalogd.ERROR</span>. You might also see a file
+ <span class="ph filepath">catalogd.FATAL</span>, although this is only present in rare conditions.
+ </li>
+
+ <li class="li">
+ Examine the <code class="ph codeph">.INFO</code> files to see configuration settings for the processes.
+ </li>
+
+ <li class="li">
+ Examine the <code class="ph codeph">.WARNING</code> files to see all kinds of problem information, including such
+ things as suboptimal settings and also serious runtime errors.
+ </li>
+
+ <li class="li">
+ Examine the <code class="ph codeph">.ERROR</code> and/or <code class="ph codeph">.FATAL</code> files to see only the most serious
+ errors, if the processes crash, or queries fail to complete. These messages are also in the
+ <code class="ph codeph">.WARNING</code> file.
+ </li>
+
+ <li class="li">
+ A new set of log files is produced each time the associated daemon is restarted. These log files have
+ long names including a timestamp. The <code class="ph codeph">.INFO</code>, <code class="ph codeph">.WARNING</code>, and
+ <code class="ph codeph">.ERROR</code> files are physically represented as symbolic links to the latest applicable log
+ files.
+ </li>
+
+ <li class="li">
+ The init script for the <code class="ph codeph">impala-server</code> service also produces a consolidated log file
+ <code class="ph codeph">/var/logs/impalad/impala-server.log</code>, with all the same information as the
+ corresponding<code class="ph codeph">.INFO</code>, <code class="ph codeph">.WARNING</code>, and <code class="ph codeph">.ERROR</code> files.
+ </li>
+
+ <li class="li">
+ The init script for the <code class="ph codeph">impala-state-store</code> service also produces a consolidated log file
+ <code class="ph codeph">/var/logs/impalad/impala-state-store.log</code>, with all the same information as the
+ corresponding<code class="ph codeph">.INFO</code>, <code class="ph codeph">.WARNING</code>, and <code class="ph codeph">.ERROR</code> files.
+ </li>
+ </ul>
+
+ <p class="p">
+ Impala stores information using the <code class="ph codeph">glog_v</code> logging system. You will see some messages
+ referring to C++ file names. Logging is affected by:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The <code class="ph codeph">GLOG_v</code> environment variable specifies which types of messages are logged. See
+ <a class="xref" href="#log_levels">Setting Logging Levels</a> for details.
+ </li>
+
+ <li class="li">
+ The <code class="ph codeph">-logbuflevel</code> startup flag for the <span class="keyword cmdname">impalad</span> daemon specifies how
+ often the log information is written to disk. The default is 0, meaning that the log is immediately
+ flushed to disk when Impala outputs an important messages such as a warning or an error, but less
+ important messages such as informational ones are buffered in memory rather than being flushed to disk
+ immediately.
+ </li>
+ </ul>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="logging__logs_managing">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Managing Impala Logs</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Review Impala log files on each host, when you have traced an issue back to a specific system.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="logging__logs_rotate">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Rotating Impala Logs</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala periodically switches the physical files representing the current log files, after which it is safe
+ to remove the old files if they are no longer needed.
+ </p>
+
+ <p class="p">
+ Impala can automatically remove older unneeded log files, a feature known as <dfn class="term">log rotation</dfn>.
+
+ </p>
+
+ <p class="p">
+ In Impala 2.2 and higher, the <code class="ph codeph">-max_log_files</code> configuration option specifies how many log
+ files to keep at each severity level. You can specify an appropriate setting for each Impala-related daemon
+ (<span class="keyword cmdname">impalad</span>, <span class="keyword cmdname">statestored</span>, and <span class="keyword cmdname">catalogd</span>). The default
+ value is 10, meaning that Impala preserves the latest 10 log files for each severity level
+ (<code class="ph codeph">INFO</code>, <code class="ph codeph">WARNING</code>, <code class="ph codeph">ERROR</code>, and <code class="ph codeph">FATAL</code>).
+ Impala checks to see if any old logs need to be removed based on the interval specified in the
+ <code class="ph codeph">logbufsecs</code> setting, every 5 seconds by default.
+ </p>
+
+
+
+ <p class="p">
+ A value of 0 preserves all log files, in which case you would set up set up manual log rotation using your
+ Linux tool or technique of choice. A value of 1 preserves only the very latest log file.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="logging__logs_debug">
+
+ <h2 class="title topictitle2" id="ariaid-title5">Reviewing Impala Logs</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ By default, the Impala log is stored at <code class="ph codeph">/var/logs/impalad/</code>. The most comprehensive log,
+ showing informational, warning, and error messages, is in the file name <span class="ph filepath">impalad.INFO</span>.
+ View log file contents by using the web interface or by examining the contents of the log file. (When you
+ examine the logs through the file system, you can troubleshoot problems by reading the
+ <span class="ph filepath">impalad.WARNING</span> and/or <span class="ph filepath">impalad.ERROR</span> files, which contain the
+ subsets of messages indicating potential problems.)
+ </p>
+
+ <p class="p">
+ On a machine named <code class="ph codeph">impala.example.com</code> with default settings, you could view the Impala
+ logs on that machine by using a browser to access <code class="ph codeph">http://impala.example.com:25000/logs</code>.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The web interface limits the amount of logging information displayed. To view every log entry, access the
+ log files directly through the file system.
+ </p>
+ </div>
+
+ <p class="p">
+ You can view the contents of the <code class="ph codeph">impalad.INFO</code> log file in the file system. With the
+ default configuration settings, the start of the log file appears as follows:
+ </p>
+
+<pre class="pre codeblock"><code>[user@example impalad]$ pwd
+/var/log/impalad
+[user@example impalad]$ more impalad.INFO
+Log file created at: 2013/01/07 08:42:12
+Running on machine: impala.example.com
+Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
+I0107 08:42:12.292155 14876 daemon.cc:34] impalad version 0.4 RELEASE (build 9d7fadca0461ab40b9e9df8cdb47107ec6b27cff)
+Built on Fri, 21 Dec 2012 12:55:19 PST
+I0107 08:42:12.292484 14876 daemon.cc:35] Using hostname: impala.example.com
+I0107 08:42:12.292706 14876 logging.cc:76] Flags (see also /varz are on debug webserver):
+--dump_ir=false
+--module_output=
+--be_port=22000
+--classpath=
+--hostname=impala.example.com</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ The preceding example shows only a small part of the log file. Impala log files are often several megabytes
+ in size.
+ </div>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="logging__log_format">
+
+ <h2 class="title topictitle2" id="ariaid-title6">Understanding Impala Log Contents</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The logs store information about Impala startup options. This information appears once for each time Impala
+ is started and may include:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Machine name.
+ </li>
+
+ <li class="li">
+ Impala version number.
+ </li>
+
+ <li class="li">
+ Flags used to start Impala.
+ </li>
+
+ <li class="li">
+ CPU information.
+ </li>
+
+ <li class="li">
+ The number of available disks.
+ </li>
+ </ul>
+
+ <p class="p">
+ There is information about each job Impala has run. Because each Impala job creates an additional set of
+ data about queries, the amount of job specific data may be very large. Logs may contained detailed
+ information on jobs. These detailed log entries may include:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The composition of the query.
+ </li>
+
+ <li class="li">
+ The degree of data locality.
+ </li>
+
+ <li class="li">
+ Statistics on data throughput and response times.
+ </li>
+ </ul>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="logging__log_levels">
+
+ <h2 class="title topictitle2" id="ariaid-title7">Setting Logging Levels</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala uses the GLOG system, which supports three logging levels. You can adjust logging levels
+ by exporting variable settings. To change logging settings manually, use a command
+ similar to the following on each node before starting <code class="ph codeph">impalad</code>:
+ </p>
+
+<pre class="pre codeblock"><code>export GLOG_v=1</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ For performance reasons, do not enable the most verbose logging level of 3 unless there is
+ no other alternative for troubleshooting.
+ </div>
+
+ <p class="p">
+ For more information on how to configure GLOG, including how to set variable logging levels for different
+ system components, see
+ <a class="xref" href="http://google-glog.googlecode.com/svn/trunk/doc/glog.html" target="_blank">How
+ To Use Google Logging Library (glog)</a>.
+ </p>
+
+ <section class="section" id="log_levels__loglevels_details"><h3 class="title sectiontitle">Understanding What is Logged at Different Logging Levels</h3>
+
+
+
+ <p class="p">
+ As logging levels increase, the categories of information logged are cumulative. For example, GLOG_v=2
+ records everything GLOG_v=1 records, as well as additional information.
+ </p>
+
+ <p class="p">
+ Increasing logging levels imposes performance overhead and increases log size. Where practical, use
+ GLOG_v=1 for most cases: this level has minimal performance impact but still captures useful
+ troubleshooting information.
+ </p>
+
+ <p class="p">
+ Additional information logged at each level is as follows:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ GLOG_v=1 - The default level. Logs information about each connection and query that is initiated to an
+ <code class="ph codeph">impalad</code> instance, including runtime profiles.
+ </li>
+
+ <li class="li">
+ GLOG_v=2 - Everything from the previous level plus information for each RPC initiated. This level also
+ records query execution progress information, including details on each file that is read.
+ </li>
+
+ <li class="li">
+ GLOG_v=3 - Everything from the previous level plus logging of every row that is read. This level is
+ only applicable for the most serious troubleshooting and tuning scenarios, because it can produce
+ exceptionally large and detailed log files, potentially leading to its own set of performance and
+ capacity problems.
+ </li>
+ </ul>
+
+ </section>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="logging__redaction">
+
+ <h2 class="title topictitle2" id="ariaid-title8">Redacting Sensitive Information from Impala Log Files</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ <dfn class="term">Log redaction</dfn> is a security feature that prevents sensitive information from being displayed in
+ locations used by administrators for monitoring and troubleshooting, such as log files and the Impala debug web
+ user interface. You configure regular expressions that match sensitive types of information processed by your
+ system, such as credit card numbers or tax IDs, and literals matching these patterns are obfuscated wherever
+ they would normally be recorded in log files or displayed in administration or debugging user interfaces.
+ </p>
+
+ <p class="p">
+ In a security context, the log redaction feature is complementary to the Sentry authorization framework.
+ Sentry prevents unauthorized users from being able to directly access table data. Redaction prevents
+ administrators or support personnel from seeing the smaller amounts of sensitive or personally identifying
+ information (PII) that might appear in queries issued by those authorized users.
+ </p>
+
+ <p class="p">
+ See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details about how to enable this feature and set
+ up the regular expressions to detect and redact sensitive information within SQL statement text.
+ </p>
+
+ </div>
+
+ </article>
+
+</article></main></body></html>
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_map.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_map.html b/docs/build/html/topics/impala_map.html
new file mode 100644
index 0000000..31d91f0
--- /dev/null
+++ b/docs/build/html/topics/impala_map.html
@@ -0,0 +1,331 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="map"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MAP Complex Type (Impala 2.3 or higher only)</title></head><body id="map"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">MAP Complex Type (<span class="keyword">Impala 2.3</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ A complex data type representing an arbitrary set of key-value pairs.
+ The key part is a scalar type, while the value part can be a scalar or
+ another complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>,
+ or <code class="ph codeph">MAP</code>).
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> MAP < <var class="keyword varname">primitive_type</var>, <var class="keyword varname">type</var> >
+
+type ::= <var class="keyword varname">primitive_type</var> | <var class="keyword varname">complex_type</var>
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Because complex types are often used in combination,
+ for example an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>
+ elements, if you are unfamiliar with the Impala complex types,
+ start with <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for
+ background information and usage examples.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">MAP</code> complex data type represents a set of key-value pairs.
+ Each element of the map is indexed by a primitive type such as <code class="ph codeph">BIGINT</code> or
+ <code class="ph codeph">STRING</code>, letting you define sequences that are not continuous or categories with arbitrary names.
+ You might find it convenient for modelling data produced in other languages, such as a
+ Python dictionary or Java HashMap, where a single scalar value serves as the lookup key.
+ </p>
+
+ <p class="p">
+ In a big data context, the keys in a map column might represent a numeric sequence of events during a
+ manufacturing process, or <code class="ph codeph">TIMESTAMP</code> values corresponding to sensor observations.
+ The map itself is inherently unordered, so you choose whether to make the key values significant
+ (such as a recorded <code class="ph codeph">TIMESTAMP</code>) or synthetic (such as a random global universal ID).
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Behind the scenes, the <code class="ph codeph">MAP</code> type is implemented in a similar way as the
+ <code class="ph codeph">ARRAY</code> type. Impala does not enforce any uniqueness constraint on the
+ <code class="ph codeph">KEY</code> values, and the <code class="ph codeph">KEY</code> values are processed by
+ looping through the elements of the <code class="ph codeph">MAP</code> rather than by a constant-time lookup.
+ Therefore, this type is primarily for ease of understanding when importing data and
+ algorithms from non-SQL contexts, rather than optimizing the performance of key lookups.
+ </div>
+
+ <p class="p">
+ You can pass a multi-part qualified name to <code class="ph codeph">DESCRIBE</code>
+ to specify an <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+ column and visualize its structure as if it were a table.
+ For example, if table <code class="ph codeph">T1</code> contains an <code class="ph codeph">ARRAY</code> column
+ <code class="ph codeph">A1</code>, you could issue the statement <code class="ph codeph">DESCRIBE t1.a1</code>.
+ If table <code class="ph codeph">T1</code> contained a <code class="ph codeph">STRUCT</code> column <code class="ph codeph">S1</code>,
+ and a field <code class="ph codeph">F1</code> within the <code class="ph codeph">STRUCT</code> was a <code class="ph codeph">MAP</code>,
+ you could issue the statement <code class="ph codeph">DESCRIBE t1.s1.f1</code>.
+ An <code class="ph codeph">ARRAY</code> is shown as a two-column table, with
+ <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code> columns.
+ A <code class="ph codeph">STRUCT</code> is shown as a table with each field
+ representing a column in the table.
+ A <code class="ph codeph">MAP</code> is shown as a two-column table, with
+ <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code> columns.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Columns with this data type can only be used in tables or partitions with the Parquet file format.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Columns with this data type cannot be used as partition key columns in a partitioned table.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">COMPUTE STATS</code> statement does not produce any statistics for columns of this data type.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p" id="map__d6e2889">
+ The maximum length of the column definition for any complex type, including declarations for any nested types,
+ is 4000 characters.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types_limits">Limitations and Restrictions for Complex Types</a> for a full list of limitations
+ and associated guidelines about complex type columns.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+ <p class="p">
+ Currently, the data types <code class="ph codeph">DECIMAL</code>, <code class="ph codeph">TIMESTAMP</code>, <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>,
+ <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> cannot be used with Kudu tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Many of the complex type examples refer to tables
+ such as <code class="ph codeph">CUSTOMER</code> and <code class="ph codeph">REGION</code>
+ adapted from the tables used in the TPC-H benchmark.
+ See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_sample_schema">Sample Schema and Data for Experimenting with Impala Complex Types</a>
+ for the table definitions.
+ </div>
+
+ <p class="p">
+ The following example shows a table with various kinds of <code class="ph codeph">MAP</code> columns,
+ both at the top level and nested within other complex types.
+ Each row represents information about a specific country, with complex type fields
+ of various levels of nesting to represent different information associated
+ with the country: factual measurements such as area and population,
+ notable people in different categories, geographic features such as
+ cities, points of interest within each city, and mountains with associated facts.
+ Practice the <code class="ph codeph">CREATE TABLE</code> and query notation for complex type columns
+ using empty tables, until you can visualize a complex data structure and construct corresponding SQL statements reliably.
+ </p>
+
+<pre class="pre codeblock"><code>create TABLE map_demo
+(
+ country_id BIGINT,
+
+-- Numeric facts about each country, looked up by name.
+-- For example, 'Area':1000, 'Population':999999.
+-- Using a MAP instead of a STRUCT because there could be
+-- a different set of facts for each country.
+ metrics MAP <STRING, BIGINT>,
+
+-- MAP whose value part is an ARRAY.
+-- For example, the key 'Famous Politicians' could represent an array of 10 elements,
+-- while the key 'Famous Actors' could represent an array of 20 elements.
+ notables MAP <STRING, ARRAY <STRING>>,
+
+-- MAP that is a field within a STRUCT.
+-- (The STRUCT is inside another ARRAY, because it is rare
+-- for a STRUCT to be a top-level column.)
+-- For example, city #1 might have points of interest with key 'Zoo',
+-- representing an array of 3 different zoos.
+-- City #2 might have completely different kinds of points of interest.
+-- Because the set of field names is potentially large, and most entries could be blank,
+-- a MAP makes more sense than a STRUCT to represent such a sparse data structure.
+ cities ARRAY < STRUCT <
+ name: STRING,
+ points_of_interest: MAP <STRING, ARRAY <STRING>>
+ >>,
+
+-- MAP that is an element within an ARRAY. The MAP is inside a STRUCT field to associate
+-- the mountain name with all the facts about the mountain.
+-- The "key" of the map (the first STRING field) represents the name of some fact whose value
+-- can be expressed as an integer, such as 'Height', 'Year First Climbed', and so on.
+ mountains ARRAY < STRUCT < name: STRING, facts: MAP <STRING, INT > > >
+)
+STORED AS PARQUET;
+
+</code></pre>
+
+<pre class="pre codeblock"><code>DESCRIBE map_demo;
++------------+------------------------------------------------+
+| name | type |
++------------+------------------------------------------------+
+| country_id | bigint |
+| metrics | map<string,bigint> |
+| notables | map<string,array<string>> |
+| cities | array<struct< |
+| | name:string, |
+| | points_of_interest:map<string,array<string>> |
+| | >> |
+| mountains | array<struct< |
+| | name:string, |
+| | facts:map<string,int> |
+| | >> |
++------------+------------------------------------------------+
+
+DESCRIBE map_demo.metrics;
++-------+--------+
+| name | type |
++-------+--------+
+| key | string |
+| value | bigint |
++-------+--------+
+
+DESCRIBE map_demo.notables;
++-------+---------------+
+| name | type |
++-------+---------------+
+| key | string |
+| value | array<string> |
++-------+---------------+
+
+DESCRIBE map_demo.notables.value;
++------+--------+
+| name | type |
++------+--------+
+| item | string |
+| pos | bigint |
++------+--------+
+
+DESCRIBE map_demo.cities;
++------+------------------------------------------------+
+| name | type |
++------+------------------------------------------------+
+| item | struct< |
+| | name:string, |
+| | points_of_interest:map<string,array<string>> |
+| | > |
+| pos | bigint |
++------+------------------------------------------------+
+
+DESCRIBE map_demo.cities.item.points_of_interest;
++-------+---------------+
+| name | type |
++-------+---------------+
+| key | string |
+| value | array<string> |
++-------+---------------+
+
+DESCRIBE map_demo.cities.item.points_of_interest.value;
++------+--------+
+| name | type |
++------+--------+
+| item | string |
+| pos | bigint |
++------+--------+
+
+DESCRIBE map_demo.mountains;
++------+-------------------------+
+| name | type |
++------+-------------------------+
+| item | struct< |
+| | name:string, |
+| | facts:map<string,int> |
+| | > |
+| pos | bigint |
++------+-------------------------+
+
+DESCRIBE map_demo.mountains.item.facts;
++-------+--------+
+| name | type |
++-------+--------+
+| key | string |
+| value | int |
++-------+--------+
+
+</code></pre>
+
+ <p class="p">
+ The following example shows a table that uses a variety of data types for the <code class="ph codeph">MAP</code>
+ <span class="q">"key"</span> field. Typically, you use <code class="ph codeph">BIGINT</code> or <code class="ph codeph">STRING</code> to use
+ numeric or character-based keys without worrying about exceeding any size or length constraints.
+ </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE map_demo_obscure
+(
+ id BIGINT,
+ m1 MAP <INT, INT>,
+ m2 MAP <SMALLINT, INT>,
+ m3 MAP <TINYINT, INT>,
+ m4 MAP <TIMESTAMP, INT>,
+ m5 MAP <BOOLEAN, INT>,
+ m6 MAP <CHAR(5), INT>,
+ m7 MAP <VARCHAR(25), INT>,
+ m8 MAP <FLOAT, INT>,
+ m9 MAP <DOUBLE, INT>,
+ m10 MAP <DECIMAL(12,2), INT>
+)
+STORED AS PARQUET;
+
+</code></pre>
+
+<pre class="pre codeblock"><code>CREATE TABLE celebrities (name STRING, birth_year MAP < STRING, SMALLINT >) STORED AS PARQUET;
+-- A typical row might represent values with 2 different birth years, such as:
+-- ("Joe Movie Star", { "real": 1972, "claimed": 1977 })
+
+CREATE TABLE countries (name STRING, famous_leaders MAP < INT, STRING >) STORED AS PARQUET;
+-- A typical row might represent values with different leaders, with key values corresponding to their numeric sequence, such as:
+-- ("United States", { 1: "George Washington", 3: "Thomas Jefferson", 16: "Abraham Lincoln" })
+
+CREATE TABLE airlines (name STRING, special_meals MAP < STRING, MAP < STRING, STRING > >) STORED AS PARQUET;
+-- A typical row might represent values with multiple kinds of meals, each with several components:
+-- ("Elegant Airlines",
+-- {
+-- "vegetarian": { "breakfast": "pancakes", "snack": "cookies", "dinner": "rice pilaf" },
+-- "gluten free": { "breakfast": "oatmeal", "snack": "fruit", "dinner": "chicken" }
+-- } )
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a>,
+ <a class="xref" href="impala_array.html#array">ARRAY Complex Type (Impala 2.3 or higher only)</a>,
+ <a class="xref" href="impala_struct.html#struct">STRUCT Complex Type (Impala 2.3 or higher only)</a>
+
+ </p>
+
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
\ No newline at end of file