You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by mi...@apache.org on 2018/05/09 21:10:56 UTC
[47/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_appx_median.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_appx_median.html b/docs/build3x/html/topics/impala_appx_median.html
new file mode 100644
index 0000000..3003ec0
--- /dev/null
+++ b/docs/build3x/html/topics/impala_appx_median.html
@@ -0,0 +1,132 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="appx_median"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>APPX_MEDIAN Function</title></head><body id="appx_median"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">APPX_MEDIAN Function</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ An aggregate function that returns a value that is approximately the median (midpoint) of values in the set
+ of input values.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>APPX_MEDIAN([DISTINCT | ALL] <var class="keyword varname">expression</var>)
+</code></pre>
+
+ <p class="p">
+ This function works with any input type, because the only requirement is that the type supports less-than and
+ greater-than comparison operators.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Because the return value represents the estimated midpoint, it might not reflect the precise midpoint value,
+ especially if the cardinality of the input values is very high. If the cardinality is low (up to
+ approximately 20,000), the result is more accurate because the sampling considers all or almost all of the
+ different values.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input value, except for <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code>
+ arguments which produce a <code class="ph codeph">STRING</code> result
+ </p>
+
+ <p class="p">
+ The return value is always the same as one of the input values, not an <span class="q">"in-between"</span> value produced by
+ averaging.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+ <p class="p">
+ This function cannot be used in an analytic context. That is, the <code class="ph codeph">OVER()</code> clause is not allowed at all with this function.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">APPX_MEDIAN</code> function returns only the first 10 characters for
+ string values (string, varchar, char). Additional characters are truncated.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example uses a table of a million random floating-point numbers ranging up to approximately
+ 50,000. The average is approximately 25,000. Because of the random distribution, we would expect the median
+ to be close to this same number. Computing the precise median is a more intensive operation than computing
+ the average, because it requires keeping track of every distinct value and how many times each occurs. The
+ <code class="ph codeph">APPX_MEDIAN()</code> function uses a sampling algorithm to return an approximate result, which in
+ this case is close to the expected value. To make sure that the value is not substantially out of range due
+ to a skewed distribution, subsequent queries confirm that there are approximately 500,000 values higher than
+ the <code class="ph codeph">APPX_MEDIAN()</code> value, and approximately 500,000 values lower than the
+ <code class="ph codeph">APPX_MEDIAN()</code> value.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select min(x), max(x), avg(x) from million_numbers;
++-------------------+-------------------+-------------------+
+| min(x) | max(x) | avg(x) |
++-------------------+-------------------+-------------------+
+| 4.725693727250069 | 49994.56852674231 | 24945.38563793553 |
++-------------------+-------------------+-------------------+
+[localhost:21000] > select appx_median(x) from million_numbers;
++----------------+
+| appx_median(x) |
++----------------+
+| 24721.6 |
++----------------+
+[localhost:21000] > select count(x) as higher from million_numbers where x > (select appx_median(x) from million_numbers);
++--------+
+| higher |
++--------+
+| 502013 |
++--------+
+[localhost:21000] > select count(x) as lower from million_numbers where x < (select appx_median(x) from million_numbers);
++--------+
+| lower |
++--------+
+| 497987 |
++--------+
+</code></pre>
+
+ <p class="p">
+ The following example computes the approximate median using a subset of the values from the table, and then
+ confirms that the result is a reasonable estimate for the midpoint.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select appx_median(x) from million_numbers where x between 1000 and 5000;
++-------------------+
+| appx_median(x) |
++-------------------+
+| 3013.107787358159 |
++-------------------+
+[localhost:21000] > select count(x) as higher from million_numbers where x between 1000 and 5000 and x > 3013.107787358159;
++--------+
+| higher |
++--------+
+| 37692 |
++--------+
+[localhost:21000] > select count(x) as lower from million_numbers where x between 1000 and 5000 and x < 3013.107787358159;
++-------+
+| lower |
++-------+
+| 37089 |
++-------+
+</code></pre>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_array.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_array.html b/docs/build3x/html/topics/impala_array.html
new file mode 100644
index 0000000..caddc89
--- /dev/null
+++ b/docs/build3x/html/topics/impala_array.html
@@ -0,0 +1,321 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="array"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ARRAY Complex Type (Impala 2.3 or higher only)</title></head><body id="array"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">ARRAY Complex Type (<span class="keyword">Impala 2.3</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ A complex data type that can represent an arbitrary number of ordered elements.
+ The elements can be scalars or another complex type (<code class="ph codeph">ARRAY</code>,
+ <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>).
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> ARRAY < <var class="keyword varname">type</var> >
+
+type ::= <var class="keyword varname">primitive_type</var> | <var class="keyword varname">complex_type</var>
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Because complex types are often used in combination,
+ for example an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>
+ elements, if you are unfamiliar with the Impala complex types,
+ start with <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for
+ background information and usage examples.
+ </p>
+
+ <p class="p">
+ The elements of the array have no names. You refer to the value of the array item using the
+ <code class="ph codeph">ITEM</code> pseudocolumn, or its position in the array with the <code class="ph codeph">POS</code>
+ pseudocolumn. See <a class="xref" href="impala_complex_types.html#item">ITEM and POS Pseudocolumns</a> for information about
+ these pseudocolumns.
+ </p>
+
+
+
+ <p class="p">
+ Each row can have a different number of elements (including none) in the array for that row.
+ </p>
+
+
+
+ <p class="p">
+ When an array contains items of scalar types, you can use aggregation functions on the array elements without using join notation. For
+ example, you can find the <code class="ph codeph">COUNT()</code>, <code class="ph codeph">AVG()</code>, <code class="ph codeph">SUM()</code>, and so on of numeric array
+ elements, or the <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code> of any scalar array elements by referring to
+ <code class="ph codeph"><var class="keyword varname">table_name</var>.<var class="keyword varname">array_column</var></code> in the <code class="ph codeph">FROM</code> clause of the query. When
+ you need to cross-reference values from the array with scalar values from the same row, such as by including a <code class="ph codeph">GROUP
+ BY</code> clause to produce a separate aggregated result for each row, then the join clause is required.
+ </p>
+
+ <p class="p">
+ A common usage pattern with complex types is to have an array as the top-level type for the column:
+ an array of structs, an array of maps, or an array of arrays.
+ For example, you can model a denormalized table by creating a column that is an <code class="ph codeph">ARRAY</code>
+ of <code class="ph codeph">STRUCT</code> elements; each item in the array represents a row from a table that would
+ normally be used in a join query. This kind of data structure lets you essentially denormalize tables by
+ associating multiple rows from one table with the matching row in another table.
+ </p>
+
+ <p class="p">
+ You typically do not create more than one top-level <code class="ph codeph">ARRAY</code> column, because if there is
+ some relationship between the elements of multiple arrays, it is convenient to model the data as
+ an array of another complex type element (either <code class="ph codeph">STRUCT</code> or <code class="ph codeph">MAP</code>).
+ </p>
+
+ <p class="p">
+ You can pass a multi-part qualified name to <code class="ph codeph">DESCRIBE</code>
+ to specify an <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+ column and visualize its structure as if it were a table.
+ For example, if table <code class="ph codeph">T1</code> contains an <code class="ph codeph">ARRAY</code> column
+ <code class="ph codeph">A1</code>, you could issue the statement <code class="ph codeph">DESCRIBE t1.a1</code>.
+ If table <code class="ph codeph">T1</code> contained a <code class="ph codeph">STRUCT</code> column <code class="ph codeph">S1</code>,
+ and a field <code class="ph codeph">F1</code> within the <code class="ph codeph">STRUCT</code> was a <code class="ph codeph">MAP</code>,
+ you could issue the statement <code class="ph codeph">DESCRIBE t1.s1.f1</code>.
+ An <code class="ph codeph">ARRAY</code> is shown as a two-column table, with
+ <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code> columns.
+ A <code class="ph codeph">STRUCT</code> is shown as a table with each field
+ representing a column in the table.
+ A <code class="ph codeph">MAP</code> is shown as a two-column table, with
+ <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code> columns.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Columns with this data type can only be used in tables or partitions with the Parquet file format.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Columns with this data type cannot be used as partition key columns in a partitioned table.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">COMPUTE STATS</code> statement does not produce any statistics for columns of this data type.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p" id="array__d6e3285">
+ The maximum length of the column definition for any complex type, including declarations for any nested types,
+ is 4000 characters.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types_limits">Limitations and Restrictions for Complex Types</a> for a full list of limitations
+ and associated guidelines about complex type columns.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+ <p class="p">
+ Currently, the data types <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>,
+ <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> cannot be used with Kudu tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Many of the complex type examples refer to tables
+ such as <code class="ph codeph">CUSTOMER</code> and <code class="ph codeph">REGION</code>
+ adapted from the tables used in the TPC-H benchmark.
+ See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_sample_schema">Sample Schema and Data for Experimenting with Impala Complex Types</a>
+ for the table definitions.
+ </div>
+
+ <p class="p">
+ The following example shows how to construct a table with various kinds of <code class="ph codeph">ARRAY</code> columns,
+ both at the top level and nested within other complex types.
+ Whenever the <code class="ph codeph">ARRAY</code> consists of a scalar value, such as in the <code class="ph codeph">PETS</code>
+ column or the <code class="ph codeph">CHILDREN</code> field, you can see that future expansion is limited.
+ For example, you could not easily evolve the schema to record the kind of pet or the child's birthday alongside the name.
+ Therefore, it is more common to use an <code class="ph codeph">ARRAY</code> whose elements are of <code class="ph codeph">STRUCT</code> type,
+ to associate multiple fields with each array element.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Practice the <code class="ph codeph">CREATE TABLE</code> and query notation for complex type columns
+ using empty tables, until you can visualize a complex data structure and construct corresponding SQL statements reliably.
+ </div>
+
+
+
+<pre class="pre codeblock"><code>CREATE TABLE array_demo
+(
+ id BIGINT,
+ name STRING,
+-- An ARRAY of scalar type as a top-level column.
+ pets ARRAY <STRING>,
+
+-- An ARRAY with elements of complex type (STRUCT).
+ places_lived ARRAY < STRUCT <
+ place: STRING,
+ start_year: INT
+ >>,
+
+-- An ARRAY as a field (CHILDREN) within a STRUCT.
+-- (The STRUCT is inside another ARRAY, because it is rare
+-- for a STRUCT to be a top-level column.)
+ marriages ARRAY < STRUCT <
+ spouse: STRING,
+ children: ARRAY <STRING>
+ >>,
+
+-- An ARRAY as the value part of a MAP.
+-- The first MAP field (the key) would be a value such as
+-- 'Parent' or 'Grandparent', and the corresponding array would
+-- represent 2 parents, 4 grandparents, and so on.
+ ancestors MAP < STRING, ARRAY <STRING> >
+)
+STORED AS PARQUET;
+
+</code></pre>
+
+ <p class="p">
+ The following example shows how to examine the structure of a table containing one or more <code class="ph codeph">ARRAY</code> columns by using the
+ <code class="ph codeph">DESCRIBE</code> statement. You can visualize each <code class="ph codeph">ARRAY</code> as its own two-column table, with columns
+ <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code>.
+ </p>
+
+
+
+<pre class="pre codeblock"><code>DESCRIBE array_demo;
++--------------+---------------------------+
+| name | type |
++--------------+---------------------------+
+| id | bigint |
+| name | string |
+| pets | array<string> |
+| marriages | array<struct< |
+| | spouse:string, |
+| | children:array<string> |
+| | >> |
+| places_lived | array<struct< |
+| | place:string, |
+| | start_year:int |
+| | >> |
+| ancestors | map<string,array<string>> |
++--------------+---------------------------+
+
+DESCRIBE array_demo.pets;
++------+--------+
+| name | type |
++------+--------+
+| item | string |
+| pos | bigint |
++------+--------+
+
+DESCRIBE array_demo.marriages;
++------+--------------------------+
+| name | type |
++------+--------------------------+
+| item | struct< |
+| | spouse:string, |
+| | children:array<string> |
+| | > |
+| pos | bigint |
++------+--------------------------+
+
+DESCRIBE array_demo.places_lived;
++------+------------------+
+| name | type |
++------+------------------+
+| item | struct< |
+| | place:string, |
+| | start_year:int |
+| | > |
+| pos | bigint |
++------+------------------+
+
+DESCRIBE array_demo.ancestors;
++-------+---------------+
+| name | type |
++-------+---------------+
+| key | string |
+| value | array<string> |
++-------+---------------+
+
+</code></pre>
+
+ <p class="p">
+ The following example shows queries involving <code class="ph codeph">ARRAY</code> columns containing elements of scalar or complex types. You
+ <span class="q">"unpack"</span> each <code class="ph codeph">ARRAY</code> column by referring to it in a join query, as if it were a separate table with
+ <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code> columns. If the array element is a scalar type, you refer to its value using the
+ <code class="ph codeph">ITEM</code> pseudocolumn. If the array element is a <code class="ph codeph">STRUCT</code>, you refer to the <code class="ph codeph">STRUCT</code> fields
+ using dot notation and the field names. If the array element is another <code class="ph codeph">ARRAY</code> or a <code class="ph codeph">MAP</code>, you use
+ another level of join to unpack the nested collection elements.
+ </p>
+
+
+
+<pre class="pre codeblock"><code>-- Array of scalar values.
+-- Each array element represents a single string, plus we know its position in the array.
+SELECT id, name, pets.pos, pets.item FROM array_demo, array_demo.pets;
+
+-- Array of structs.
+-- Now each array element has named fields, possibly of different types.
+-- You can consider an ARRAY of STRUCT to represent a table inside another table.
+SELECT id, name, places_lived.pos, places_lived.item.place, places_lived.item.start_year
+FROM array_demo, array_demo.places_lived;
+
+-- The .ITEM name is optional for array elements that are structs.
+-- The following query is equivalent to the previous one, with .ITEM
+-- removed from the column references.
+SELECT id, name, places_lived.pos, places_lived.place, places_lived.start_year
+ FROM array_demo, array_demo.places_lived;
+
+-- To filter specific items from the array, do comparisons against the .POS or .ITEM
+-- pseudocolumns, or names of struct fields, in the WHERE clause.
+SELECT id, name, pets.item FROM array_demo, array_demo.pets
+ WHERE pets.pos in (0, 1, 3);
+
+SELECT id, name, pets.item FROM array_demo, array_demo.pets
+ WHERE pets.item LIKE 'Mr. %';
+
+SELECT id, name, places_lived.pos, places_lived.place, places_lived.start_year
+ FROM array_demo, array_demo.places_lived
+WHERE places_lived.place like '%California%';
+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a>,
+
+ <a class="xref" href="impala_struct.html#struct">STRUCT Complex Type (Impala 2.3 or higher only)</a>, <a class="xref" href="impala_map.html#map">MAP Complex Type (Impala 2.3 or higher only)</a>
+ </p>
+
+ </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_auditing.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_auditing.html b/docs/build3x/html/topics/impala_auditing.html
new file mode 100644
index 0000000..bbdca95
--- /dev/null
+++ b/docs/build3x/html/topics/impala_auditing.html
@@ -0,0 +1,232 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="auditing"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Auditing Impala Operations</title></head><body id="auditing"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Auditing Impala Operations</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ To monitor how Impala data is being used within your organization, ensure
+ that your Impala authorization and authentication policies are effective.
+ To detect attempts at intrusion or unauthorized access to Impala
+ data, you can use the auditing feature in Impala 1.2.1 and higher:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Enable auditing by including the option
+ <code class="ph codeph">-audit_event_log_dir=<var class="keyword varname">directory_path</var></code>
+ in your <span class="keyword cmdname">impalad</span> startup options.
+ The log directory must be a local directory on the
+ server, not an HDFS directory.
+ </li>
+
+ <li class="li">
+ Decide how many queries will be represented in each audit event log file. By default,
+ Impala starts a new audit event log file every 5000 queries. To specify a different number,
+ <span class="ph">include
+ the option <code class="ph codeph">--max_audit_event_log_file_size=<var class="keyword varname">number_of_queries</var></code>
+ in the <span class="keyword cmdname">impalad</span> startup options</span>.
+ </li>
+
+ <li class="li">
+ In <span class="keyword">Impala 2.9</span> and higher, you can control how many
+ audit event log files are kept on each host. Specify the option
+ <code class="ph codeph">--max_audit_event_log_files=<var class="keyword varname">number_of_log_files</var></code>
+ in the <span class="keyword cmdname">impalad</span> startup options. Once the limit is reached, older
+ files are rotated out using the same mechanism as for other Impala log files.
+ The default value for this setting is 0, representing an unlimited number of audit
+ event log files.
+ </li>
+
+ <li class="li">
+ Use a cluster manager with governance capabilities to filter, visualize,
+ and produce reports based on the audit logs collected
+ from all the hosts in the cluster.
+ </li>
+ </ul>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="auditing__auditing_performance">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Durability and Performance Considerations for Impala Auditing</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The auditing feature only imposes performance overhead while auditing is enabled.
+ </p>
+
+ <p class="p">
+ Because any Impala host can process a query, enable auditing on all hosts where the
+ <span class="ph"><span class="keyword cmdname">impalad</span> daemon</span>
+ runs. Each host stores its own log
+ files, in a directory in the local filesystem. The log data is periodically flushed to disk (through an
+ <code class="ph codeph">fsync()</code> system call) to avoid loss of audit data in case of a crash.
+ </p>
+
+ <p class="p">
+ The runtime overhead of auditing applies to whichever host serves as the coordinator
+ for the query, that is, the host you connect to when you issue the query. This might
+ be the same host for all queries, or different applications or users might connect to
+ and issue queries through different hosts.
+ </p>
+
+ <p class="p">
+ To avoid excessive I/O overhead on busy coordinator hosts, Impala syncs the audit log
+ data (using the <code class="ph codeph">fsync()</code> system call) periodically rather than after
+ every query. Currently, the <code class="ph codeph">fsync()</code> calls are issued at a fixed
+ interval, every 5 seconds.
+ </p>
+
+ <p class="p">
+ By default, Impala avoids losing any audit log data in the case of an error during a logging operation
+ (such as a disk full error), by immediately shutting down
+ <span class="keyword cmdname">impalad</span> on the host where the auditing problem occurred.
+ <span class="ph">You can override this setting by specifying the option
+ <code class="ph codeph">-abort_on_failed_audit_event=false</code> in the <span class="keyword cmdname">impalad</span> startup options.</span>
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="auditing__auditing_format">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Format of the Audit Log Files</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The audit log files represent the query information in JSON format, one query per line.
+ Typically, rather than looking at the log files themselves, you should use cluster-management
+ software to consolidate the log data from all Impala hosts and filter and visualize the results
+ in useful ways. (If you do examine the raw log data, you might run the files through
+ a JSON pretty-printer first.)
+ </p>
+
+ <p class="p">
+ All the information about schema objects accessed by the query is encoded in a single nested record on the
+ same line. For example, the audit log for an <code class="ph codeph">INSERT ... SELECT</code> statement records that a
+ select operation occurs on the source table and an insert operation occurs on the destination table. The
+ audit log for a query against a view records the base table accessed by the view, or multiple base tables
+ in the case of a view that includes a join query. Every Impala operation that corresponds to a SQL
+ statement is recorded in the audit logs, whether the operation succeeds or fails. Impala records more
+ information for a successful operation than for a failed one, because an unauthorized query is stopped
+ immediately, before all the query planning is completed.
+ </p>
+
+
+
+ <p class="p">
+ The information logged for each query includes:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Client session state:
+ <ul class="ul">
+ <li class="li">
+ Session ID
+ </li>
+
+ <li class="li">
+ User name
+ </li>
+
+ <li class="li">
+ Network address of the client connection
+ </li>
+ </ul>
+ </li>
+
+ <li class="li">
+ SQL statement details:
+ <ul class="ul">
+ <li class="li">
+ Query ID
+ </li>
+
+ <li class="li">
+ Statement Type - DML, DDL, and so on
+ </li>
+
+ <li class="li">
+ SQL statement text
+ </li>
+
+ <li class="li">
+ Execution start time, in local time
+ </li>
+
+ <li class="li">
+ Execution Status - Details on any errors that were encountered
+ </li>
+
+ <li class="li">
+ Target Catalog Objects:
+ <ul class="ul">
+ <li class="li">
+ Object Type - Table, View, or Database
+ </li>
+
+ <li class="li">
+ Fully qualified object name
+ </li>
+
+ <li class="li">
+ Privilege - How the object is being used (<code class="ph codeph">SELECT</code>, <code class="ph codeph">INSERT</code>,
+ <code class="ph codeph">CREATE</code>, and so on)
+ </li>
+ </ul>
+ </li>
+ </ul>
+ </li>
+ </ul>
+
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="auditing__auditing_exceptions">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Which Operations Are Audited</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The kinds of SQL queries represented in the audit log are:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Queries that are prevented due to lack of authorization.
+ </li>
+
+ <li class="li">
+ Queries that Impala can analyze and parse to determine that they are authorized. The audit data is
+ recorded immediately after Impala finishes its analysis, before the query is actually executed.
+ </li>
+ </ul>
+
+ <p class="p">
+ The audit log does not contain entries for queries that could not be parsed and analyzed. For example, a
+ query that fails due to a syntax error is not recorded in the audit log. The audit log also does not
+ contain queries that fail due to a reference to a table that does not exist, if you would be authorized to
+ access the table if it did exist.
+ </p>
+
+ <p class="p">
+ Certain statements in the <span class="keyword cmdname">impala-shell</span> interpreter, such as <code class="ph codeph">CONNECT</code>,
+ <code class="ph codeph">SUMMARY</code>, <code class="ph codeph">PROFILE</code>, <code class="ph codeph">SET</code>, and
+ <code class="ph codeph">QUIT</code>, do not correspond to actual SQL queries, and these statements are not reflected in
+ the audit log.
+ </p>
+ </div>
+ </article>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_authentication.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_authentication.html b/docs/build3x/html/topics/impala_authentication.html
new file mode 100644
index 0000000..b072c37
--- /dev/null
+++ b/docs/build3x/html/topics/impala_authentication.html
@@ -0,0 +1,37 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_kerberos.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_ldap.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_mixed_security.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_delegation.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="authentication"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Auth
entication</title></head><body id="authentication"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala Authentication</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Authentication is the mechanism to ensure that only specified hosts and users can connect to Impala. It also
+ verifies that when clients connect to Impala, they are connected to a legitimate server. This feature
+ prevents spoofing such as <dfn class="term">impersonation</dfn> (setting up a phony client system with the same account
+ and group names as a legitimate user) and <dfn class="term">man-in-the-middle attacks</dfn> (intercepting application
+ requests before they reach Impala and eavesdropping on sensitive information in the requests or the results).
+ </p>
+
+ <p class="p">
+ Impala supports authentication using either Kerberos or LDAP.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Regardless of the authentication mechanism used, Impala always creates HDFS directories and data files
+ owned by the same user (typically <code class="ph codeph">impala</code>). To implement user-level access to different
+ databases, tables, columns, partitions, and so on, use the Sentry authorization feature, as explained in
+ <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>.
+ </div>
+
+ <p class="p toc"></p>
+
+ <p class="p">
+ Once you are finished setting up authentication, move on to authorization, which involves specifying what
+ databases, tables, HDFS directories, and so on can be accessed by particular users when they connect through
+ Impala. See <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for details.
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_kerberos.html">Enabling Kerberos Authentication for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_ldap.html">Enabling LDAP Authentication for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_mixed_security.html">Using Multiple Authentication Methods with Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_delegation.html">Configuring Impala Delegation for Hue and BI Tools</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html>