You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by mi...@apache.org on 2018/05/09 21:10:59 UTC
[50/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_adls.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_adls.html b/docs/build3x/html/topics/impala_adls.html
new file mode 100644
index 0000000..4353825
--- /dev/null
+++ b/docs/build3x/html/topics/impala_adls.html
@@ -0,0 +1,638 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="adls"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala with the Azure Data Lake Store (ADLS)</title></head><body id="adls"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using Impala with the Azure Data Lake Store (ADLS)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      You can use Impala to query data residing on the Azure Data Lake Store (ADLS) filesystem.
+      This capability allows convenient access to a storage system that is remotely managed,
+      accessible from anywhere, and integrated with various cloud-based services. Impala can
+      query files in any supported file format from ADLS. The ADLS storage location
+      can be for an entire table, or individual partitions in a partitioned table.
+    </p>
+
+    <p class="p">
+      The default Impala tables use data files stored on HDFS, which are ideal for bulk loads and queries using
+      full-table scans. In contrast, queries against ADLS data are less performant, making ADLS suitable for holding
+      <span class="q">"cold"</span> data that is only queried occasionally, while more frequently accessed <span class="q">"hot"</span> data resides in
+      HDFS. In a partitioned table, you can set the <code class="ph codeph">LOCATION</code> attribute for individual partitions
+      to put some partitions on HDFS and others on ADLS, typically depending on the age of the data.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="adls__prereqs">
+    <h2 class="title topictitle2" id="ariaid-title2">Prerequisites</h2>
+    <div class="body conbody">
+      <p class="p">
+        These procedures presume that you have already set up an Azure account,
+        configured an ADLS store, and configured your Hadoop cluster with appropriate
+        credentials to be able to access ADLS. See the following resources for information:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            <a class="xref" href="https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-get-started-portal" target="_blank">Get started with Azure Data Lake Store using the Azure Portal</a>
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            <a class="xref" href="https://hadoop.apache.org/docs/current/hadoop-azure-datalake/index.html" target="_blank">Hadoop Azure Data Lake Support</a>
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="adls__sql">
+    <h2 class="title topictitle2" id="ariaid-title3">How Impala SQL Statements Work with ADLS</h2>
+    <div class="body conbody">
+      <p class="p">
+        Impala SQL statements work with data on ADLS as follows:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>
+            or <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> statements
+            can specify that a table resides on the ADLS filesystem by
+            encoding an <code class="ph codeph">adl://</code> prefix for the <code class="ph codeph">LOCATION</code>
+            property. <code class="ph codeph">ALTER TABLE</code> can also set the <code class="ph codeph">LOCATION</code>
+            property for an individual partition, so that some data in a table resides on
+            ADLS and other data in the same table resides on HDFS.
+          </p>
+          <div class="p">
+            The full format of the location URI is typically:
+<pre class="pre codeblock"><code>
+adl://<var class="keyword varname">your_account</var>.azuredatalakestore.net/<var class="keyword varname">rest_of_directory_path</var>
+</code></pre>
+          </div>
+        </li>
+        <li class="li">
+          <p class="p">
+            Once a table or partition is designated as residing on ADLS, the <a class="xref" href="impala_select.html#select">SELECT Statement</a>
+            statement transparently accesses the data files from the appropriate storage layer.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            If the ADLS table is an internal table, the <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a> statement
+            removes the corresponding data files from ADLS when the table is dropped.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <a class="xref" href="impala_truncate_table.html#truncate_table">TRUNCATE TABLE Statement (Impala 2.3 or higher only)</a> statement always removes the corresponding
+            data files from ADLS when the table is truncated.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <a class="xref" href="impala_load_data.html#load_data">LOAD DATA Statement</a> can move data files residing in HDFS into
+            an ADLS table.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <a class="xref" href="impala_insert.html#insert">INSERT Statement</a>, or the <code class="ph codeph">CREATE TABLE AS SELECT</code>
+            form of the <code class="ph codeph">CREATE TABLE</code> statement, can copy data from an HDFS table or another ADLS
+            table into an ADLS table.
+          </p>
+        </li>
+      </ul>
+      <p class="p">
+        For usage information about Impala SQL statements with ADLS tables, see <a class="xref" href="impala_adls.html#ddl">Creating Impala Databases, Tables, and Partitions for Data Stored on ADLS</a>
+        and <a class="xref" href="impala_adls.html#dml">Using Impala DML Statements for ADLS Data</a>.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="adls__creds">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Specifying Impala Credentials to Access Data in ADLS</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        To allow Impala to access data in ADLS, specify values for the following configuration settings in your
+        <span class="ph filepath">core-site.xml</span> file:
+      </p>
+
+<pre class="pre codeblock"><code>
+&lt;property&gt;
+   &lt;name&gt;dfs.adls.oauth2.access.token.provider.type&lt;/name&gt;
+   &lt;value&gt;ClientCredential&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+   &lt;name&gt;dfs.adls.oauth2.client.id&lt;/name&gt;
+   &lt;value&gt;&lt;varname&gt;your_client_id&lt;/varname&gt;&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+   &lt;name&gt;dfs.adls.oauth2.credential&lt;/name&gt;
+   &lt;value&gt;&lt;varname&gt;your_client_secret&lt;/varname&gt;&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+   &lt;name&gt;dfs.adls.oauth2.refresh.url&lt;/name&gt;
+   &lt;value&gt;&lt;varname&gt;refresh_URL&lt;/varname&gt;&lt;/value&gt;
+&lt;/property&gt;
+
+</code></pre>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+          Check if your Hadoop distribution or cluster management tool includes support for
+          filling in and distributing credentials across the cluster in an automated way.
+        </p>
+      </div>
+
+      <p class="p">
+        After specifying the credentials, restart both the Impala and
+        Hive services. (Restarting Hive is required because Impala queries, CREATE TABLE statements, and so on go
+        through the Hive metastore.)
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="adls__etl">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Loading Data into ADLS for Impala Queries</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        If your ETL pipeline involves moving data into ADLS and then querying through Impala,
+        you can either use Impala DML statements to create, move, or copy the data, or
+        use the same data loading techniques as you would for non-Impala data.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="etl__dml">
+      <h3 class="title topictitle3" id="ariaid-title6">Using Impala DML Statements for ADLS Data</h3>
+      <div class="body conbody">
+        <p class="p">
+        In <span class="keyword">Impala 2.9</span> and higher, the Impala DML statements (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>,
+        and <code class="ph codeph">CREATE TABLE AS SELECT</code>) can write data into a table or partition that resides in the
+        Azure Data Lake Store (ADLS).
+        The syntax of the DML statements is the same as for any other tables, because the ADLS location for tables and
+        partitions is specified by an <code class="ph codeph">adl://</code> prefix in the
+        <code class="ph codeph">LOCATION</code> attribute of
+        <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements.
+        If you bring data into ADLS using the normal ADLS transfer mechanisms instead of Impala DML statements,
+        issue a <code class="ph codeph">REFRESH</code> statement for the table before using Impala to query the ADLS data.
+      </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title7" id="etl__manual_etl">
+      <h3 class="title topictitle3" id="ariaid-title7">Manually Loading Data into Impala Tables on ADLS</h3>
+      <div class="body conbody">
+        <p class="p">
+          As an alternative, you can use the Microsoft-provided methods to bring data files
+          into ADLS for querying through Impala. See
+          <a class="xref" href="https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-copy-data-azure-storage-blob" target="_blank">the Microsoft ADLS documentation</a>
+          for details.
+        </p>
+
+        <p class="p">
+          After you upload data files to a location already mapped to an Impala table or partition, or if you delete
+          files in ADLS from such a location, issue the <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code>
+          statement to make Impala aware of the new set of data files.
+        </p>
+
+      </div>
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="adls__ddl">
+
+    <h2 class="title topictitle2" id="ariaid-title8">Creating Impala Databases, Tables, and Partitions for Data Stored on ADLS</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        Impala reads data for a table or partition from ADLS based on the <code class="ph codeph">LOCATION</code> attribute for the
+        table or partition. Specify the ADLS details in the <code class="ph codeph">LOCATION</code> clause of a <code class="ph codeph">CREATE
+        TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statement. The notation for the <code class="ph codeph">LOCATION</code>
+        clause is <code class="ph codeph">adl://<var class="keyword varname">store</var>/<var class="keyword varname">path/to/file</var></code>.
+      </p>
+
+      <p class="p">
+        For a partitioned table, either specify a separate <code class="ph codeph">LOCATION</code> clause for each new partition,
+        or specify a base <code class="ph codeph">LOCATION</code> for the table and set up a directory structure in ADLS to mirror
+        the way Impala partitioned tables are structured in HDFS. Although, strictly speaking, ADLS filenames do not
+        have directory paths, Impala treats ADLS filenames with <code class="ph codeph">/</code> characters the same as HDFS
+        pathnames that include directories.
+      </p>
+
+      <p class="p">
+        To point a nonpartitioned table or an individual partition at ADLS, specify a single directory
+        path in ADLS, which could be any arbitrary directory. To replicate the structure of an entire Impala
+        partitioned table or database in ADLS requires more care, with directories and subdirectories nested and
+        named to match the equivalent directory tree in HDFS. Consider setting up an empty staging area if
+        necessary in HDFS, and recording the complete directory structure so that you can replicate it in ADLS.
+      </p>
+
+      <p class="p">
+        For example, the following session creates a partitioned table where only a single partition resides on ADLS.
+        The partitions for years 2013 and 2014 are located on HDFS. The partition for year 2015 includes a
+        <code class="ph codeph">LOCATION</code> attribute with an <code class="ph codeph">adl://</code> URL, and so refers to data residing on
+        ADLS, under a specific path underneath the store <code class="ph codeph">impalademo</code>.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create database db_on_hdfs;
+[localhost:21000] &gt; use db_on_hdfs;
+[localhost:21000] &gt; create table mostly_on_hdfs (x int) partitioned by (year int);
+[localhost:21000] &gt; alter table mostly_on_hdfs add partition (year=2013);
+[localhost:21000] &gt; alter table mostly_on_hdfs add partition (year=2014);
+[localhost:21000] &gt; alter table mostly_on_hdfs add partition (year=2015)
+                  &gt;   location 'adl://impalademo.azuredatalakestore.net/dir1/dir2/dir3/t1';
+</code></pre>
+
+      <p class="p">
+        For convenience when working with multiple tables with data files stored in ADLS, you can create a database
+        with a <code class="ph codeph">LOCATION</code> attribute pointing to an ADLS path.
+        Specify a URL of the form <code class="ph codeph">adl://<var class="keyword varname">store</var>/<var class="keyword varname">root/path/for/database</var></code>
+        for the <code class="ph codeph">LOCATION</code> attribute of the database.
+        Any tables created inside that database
+        automatically create directories underneath the one specified by the database
+        <code class="ph codeph">LOCATION</code> attribute.
+      </p>
+
+      <p class="p">
+        The following session creates a database and two partitioned tables residing entirely on ADLS, one
+        partitioned by a single column and the other partitioned by multiple columns. Because a
+        <code class="ph codeph">LOCATION</code> attribute with an <code class="ph codeph">adl://</code> URL is specified for the database, the
+        tables inside that database are automatically created on ADLS underneath the database directory. To see the
+        names of the associated subdirectories, including the partition key values, we use an ADLS client tool to
+        examine how the directory structure is organized on ADLS. For example, Impala partition directories such as
+        <code class="ph codeph">month=1</code> do not include leading zeroes, which sometimes appear in partition directories created
+        through Hive.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create database db_on_adls location 'adl://impalademo.azuredatalakestore.net/dir1/dir2/dir3';
+[localhost:21000] &gt; use db_on_adls;
+
+[localhost:21000] &gt; create table partitioned_on_adls (x int) partitioned by (year int);
+[localhost:21000] &gt; alter table partitioned_on_adls add partition (year=2013);
+[localhost:21000] &gt; alter table partitioned_on_adls add partition (year=2014);
+[localhost:21000] &gt; alter table partitioned_on_adls add partition (year=2015);
+
+[localhost:21000] &gt; ! hadoop fs -ls adl://impalademo.azuredatalakestore.net/dir1/dir2/dir3 --recursive;
+2015-03-17 13:56:34          0 dir1/dir2/dir3/
+2015-03-17 16:43:28          0 dir1/dir2/dir3/partitioned_on_adls/
+2015-03-17 16:43:49          0 dir1/dir2/dir3/partitioned_on_adls/year=2013/
+2015-03-17 16:43:53          0 dir1/dir2/dir3/partitioned_on_adls/year=2014/
+2015-03-17 16:43:58          0 dir1/dir2/dir3/partitioned_on_adls/year=2015/
+
+[localhost:21000] &gt; create table partitioned_multiple_keys (x int)
+                  &gt;   partitioned by (year smallint, month tinyint, day tinyint);
+[localhost:21000] &gt; alter table partitioned_multiple_keys
+                  &gt;   add partition (year=2015,month=1,day=1);
+[localhost:21000] &gt; alter table partitioned_multiple_keys
+                  &gt;   add partition (year=2015,month=1,day=31);
+[localhost:21000] &gt; alter table partitioned_multiple_keys
+                  &gt;   add partition (year=2015,month=2,day=28);
+
+[localhost:21000] &gt; ! hadoop fs -ls adl://impalademo.azuredatalakestore.net/dir1/dir2/dir3 --recursive;
+2015-03-17 13:56:34          0 dir1/dir2/dir3/
+2015-03-17 16:47:13          0 dir1/dir2/dir3/partitioned_multiple_keys/
+2015-03-17 16:47:44          0 dir1/dir2/dir3/partitioned_multiple_keys/year=2015/month=1/day=1/
+2015-03-17 16:47:50          0 dir1/dir2/dir3/partitioned_multiple_keys/year=2015/month=1/day=31/
+2015-03-17 16:47:57          0 dir1/dir2/dir3/partitioned_multiple_keys/year=2015/month=2/day=28/
+2015-03-17 16:43:28          0 dir1/dir2/dir3/partitioned_on_adls/
+2015-03-17 16:43:49          0 dir1/dir2/dir3/partitioned_on_adls/year=2013/
+2015-03-17 16:43:53          0 dir1/dir2/dir3/partitioned_on_adls/year=2014/
+2015-03-17 16:43:58          0 dir1/dir2/dir3/partitioned_on_adls/year=2015/
+</code></pre>
+
+      <p class="p">
+        The <code class="ph codeph">CREATE DATABASE</code> and <code class="ph codeph">CREATE TABLE</code> statements create the associated
+        directory paths if they do not already exist. You can specify multiple levels of directories, and the
+        <code class="ph codeph">CREATE</code> statement creates all appropriate levels, similar to using <code class="ph codeph">mkdir
+        -p</code>.
+      </p>
+
+      <p class="p">
+        Use the standard ADLS file upload methods to actually put the data files into the right locations. You can
+        also put the directory paths and data files in place before creating the associated Impala databases or
+        tables, and Impala automatically uses the data from the appropriate location after the associated databases
+        and tables are created.
+      </p>
+
+      <p class="p">
+        You can switch whether an existing table or partition points to data in HDFS or ADLS. For example, if you
+        have an Impala table or partition pointing to data files in HDFS or ADLS, and you later transfer those data
+        files to the other filesystem, use an <code class="ph codeph">ALTER TABLE</code> statement to adjust the
+        <code class="ph codeph">LOCATION</code> attribute of the corresponding table or partition to reflect that change. Because
+        Impala does not have an <code class="ph codeph">ALTER DATABASE</code> statement, this location-switching technique is not
+        practical for entire databases that have a custom <code class="ph codeph">LOCATION</code> attribute.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="adls__internal_external">
+
+    <h2 class="title topictitle2" id="ariaid-title9">Internal and External Tables Located on ADLS</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Just as with tables located on HDFS storage, you can designate ADLS-based tables as either internal (managed
+        by Impala) or external, by using the syntax <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">CREATE EXTERNAL
+        TABLE</code> respectively. When you drop an internal table, the files associated with the table are
+        removed, even if they are on ADLS storage. When you drop an external table, the files associated with the
+        table are left alone, and are still available for access by other tools or components. See
+        <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a> for details.
+      </p>
+
+      <p class="p">
+        If the data on ADLS is intended to be long-lived and accessed by other tools in addition to Impala, create
+        any associated ADLS tables with the <code class="ph codeph">CREATE EXTERNAL TABLE</code> syntax, so that the files are not
+        deleted from ADLS when the table is dropped.
+      </p>
+
+      <p class="p">
+        If the data on ADLS is only needed for querying by Impala and can be safely discarded once the Impala
+        workflow is complete, create the associated ADLS tables using the <code class="ph codeph">CREATE TABLE</code> syntax, so
+        that dropping the table also deletes the corresponding data files on ADLS.
+      </p>
+
+      <p class="p">
+        For example, this session creates a table in ADLS with the same column layout as a table in HDFS, then
+        examines the ADLS table and queries some data from it. The table in ADLS works the same as a table in HDFS as
+        far as the expected file format of the data, table and column statistics, and other table properties. The
+        only indication that it is not an HDFS table is the <code class="ph codeph">adl://</code> URL in the
+        <code class="ph codeph">LOCATION</code> property. Many data files can reside in the ADLS directory, and their combined
+        contents form the table data. Because the data in this example is uploaded after the table is created, a
+        <code class="ph codeph">REFRESH</code> statement prompts Impala to update its cached information about the data files.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table usa_cities_adls like usa_cities location 'adl://impalademo.azuredatalakestore.net/usa_cities';
+[localhost:21000] &gt; desc usa_cities_adls;
++-------+----------+---------+
+| name  | type     | comment |
++-------+----------+---------+
+| id    | smallint |         |
+| city  | string   |         |
+| state | string   |         |
++-------+----------+---------+
+
+-- Now from a web browser, upload the same data file(s) to ADLS as in the HDFS table,
+-- under the relevant store and path. If you already have the data in ADLS, you would
+-- point the table LOCATION at an existing path.
+
+[localhost:21000] &gt; refresh usa_cities_adls;
+[localhost:21000] &gt; select count(*) from usa_cities_adls;
++----------+
+| count(*) |
++----------+
+| 289      |
++----------+
+[localhost:21000] &gt; select distinct state from sample_data_adls limit 5;
++----------------------+
+| state                |
++----------------------+
+| Louisiana            |
+| Minnesota            |
+| Georgia              |
+| Alaska               |
+| Ohio                 |
++----------------------+
+[localhost:21000] &gt; desc formatted usa_cities_adls;
++------------------------------+----------------------------------------------------+---------+
+| name                         | type                                               | comment |
++------------------------------+----------------------------------------------------+---------+
+| # col_name                   | data_type                                          | comment |
+|                              | NULL                                               | NULL    |
+| id                           | smallint                                           | NULL    |
+| city                         | string                                             | NULL    |
+| state                        | string                                             | NULL    |
+|                              | NULL                                               | NULL    |
+| # Detailed Table Information | NULL                                               | NULL    |
+| Database:                    | adls_testing                                       | NULL    |
+| Owner:                       | jrussell                                           | NULL    |
+| CreateTime:                  | Mon Mar 16 11:36:25 PDT 2017                       | NULL    |
+| LastAccessTime:              | UNKNOWN                                            | NULL    |
+| Protect Mode:                | None                                               | NULL    |
+| Retention:                   | 0                                                  | NULL    |
+| Location:                    | adl://impalademo.azuredatalakestore.net/usa_cities | NULL    |
+| Table Type:                  | MANAGED_TABLE                                      | NULL    |
+...
++------------------------------+----------------------------------------------------+---------+
+</code></pre>
+
+      <p class="p">
+        In this case, we have already uploaded a Parquet file with a million rows of data to the
+        <code class="ph codeph">sample_data</code> directory underneath the <code class="ph codeph">impalademo</code> store on ADLS. This
+        session creates a table with matching column settings pointing to the corresponding location in ADLS, then
+        queries the table. Because the data is already in place on ADLS when the table is created, no
+        <code class="ph codeph">REFRESH</code> statement is required.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table sample_data_adls
+                  &gt; (id int, id bigint, val int, zerofill string,
+                  &gt; name string, assertion boolean, city string, state string)
+                  &gt; stored as parquet location 'adl://impalademo.azuredatalakestore.net/sample_data';
+[localhost:21000] &gt; select count(*) from sample_data_adls;
++----------+
+| count(*) |
++----------+
+| 1000000  |
++----------+
+[localhost:21000] &gt; select count(*) howmany, assertion from sample_data_adls group by assertion;
++---------+-----------+
+| howmany | assertion |
++---------+-----------+
+| 667149  | true      |
+| 332851  | false     |
++---------+-----------+
+</code></pre>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="adls__queries">
+
+    <h2 class="title topictitle2" id="ariaid-title10">Running and Tuning Impala Queries for Data Stored on ADLS</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Once the appropriate <code class="ph codeph">LOCATION</code> attributes are set up at the table or partition level, you
+        query data stored in ADLS exactly the same as data stored on HDFS or in HBase:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Queries against ADLS data support all the same file formats as for HDFS data.
+        </li>
+
+        <li class="li">
+          Tables can be unpartitioned or partitioned. For partitioned tables, either manually construct paths in ADLS
+          corresponding to the HDFS directories representing partition key values, or use <code class="ph codeph">ALTER TABLE ...
+          ADD PARTITION</code> to set up the appropriate paths in ADLS.
+        </li>
+
+        <li class="li">
+          HDFS, Kudu, and HBase tables can be joined to ADLS tables, or ADLS tables can be joined with each other.
+        </li>
+
+        <li class="li">
+          Authorization using the Sentry framework to control access to databases, tables, or columns works the
+          same whether the data is in HDFS or in ADLS.
+        </li>
+
+        <li class="li">
+          The <span class="keyword cmdname">catalogd</span> daemon caches metadata for both HDFS and ADLS tables. Use
+          <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> for ADLS tables in the same situations
+          where you would issue those statements for HDFS tables.
+        </li>
+
+        <li class="li">
+          Queries against ADLS tables are subject to the same kinds of admission control and resource management as
+          HDFS tables.
+        </li>
+
+        <li class="li">
+          Metadata about ADLS tables is stored in the same metastore database as for HDFS tables.
+        </li>
+
+        <li class="li">
+          You can set up views referring to ADLS tables, the same as for HDFS tables.
+        </li>
+
+        <li class="li">
+          The <code class="ph codeph">COMPUTE STATS</code>, <code class="ph codeph">SHOW TABLE STATS</code>, and <code class="ph codeph">SHOW COLUMN
+          STATS</code> statements work for ADLS tables also.
+        </li>
+      </ul>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title11" id="queries__performance">
+
+      <h3 class="title topictitle3" id="ariaid-title11">Understanding and Tuning Impala Query Performance for ADLS Data</h3>
+
+
+      <div class="body conbody">
+
+        <p class="p">
+          Although Impala queries for data stored in ADLS might be less performant than queries against the
+          equivalent data stored in HDFS, you can still do some tuning. Here are techniques you can use to
+          interpret explain plans and profiles for queries against ADLS data, and tips to achieve the best
+          performance possible for such queries.
+        </p>
+
+        <p class="p">
+          All else being equal, performance is expected to be lower for queries running against data on ADLS rather
+          than HDFS. The actual mechanics of the <code class="ph codeph">SELECT</code> statement are somewhat different when the
+          data is in ADLS. Although the work is still distributed across the datanodes of the cluster, Impala might
+          parallelize the work for a distributed query differently for data on HDFS and ADLS. ADLS does not have the
+          same block notion as HDFS, so Impala uses heuristics to determine how to split up large ADLS files for
+          processing in parallel. Because all hosts can access any ADLS data file with equal efficiency, the
+          distribution of work might be different than for HDFS data, where the data blocks are physically read
+          using short-circuit local reads by hosts that contain the appropriate block replicas. Although the I/O to
+          read the ADLS data might be spread evenly across the hosts of the cluster, the fact that all data is
+          initially retrieved across the network means that the overall query performance is likely to be lower for
+          ADLS data than for HDFS data.
+        </p>
+
+        <p class="p">
+        Because ADLS does not expose the block sizes of data files the way HDFS does,
+        any Impala <code class="ph codeph">INSERT</code> or <code class="ph codeph">CREATE TABLE AS SELECT</code> statements
+        use the <code class="ph codeph">PARQUET_FILE_SIZE</code> query option setting to define the size of
+        Parquet data files. (Using a large block size is more important for Parquet tables than
+        for tables that use other file formats.)
+      </p>
+
+        <p class="p">
+          When optimizing aspects of for complex queries such as the join order, Impala treats tables on HDFS and
+          ADLS the same way. Therefore, follow all the same tuning recommendations for ADLS tables as for HDFS ones,
+          such as using the <code class="ph codeph">COMPUTE STATS</code> statement to help Impala construct accurate estimates of
+          row counts and cardinality. See <a class="xref" href="impala_performance.html#performance">Tuning Impala for Performance</a> for details.
+        </p>
+
+        <p class="p">
+          In query profile reports, the numbers for <code class="ph codeph">BytesReadLocal</code>,
+          <code class="ph codeph">BytesReadShortCircuit</code>, <code class="ph codeph">BytesReadDataNodeCached</code>, and
+          <code class="ph codeph">BytesReadRemoteUnexpected</code> are blank because those metrics come from HDFS.
+          If you do see any indications that a query against an ADLS table performed <span class="q">"remote read"</span>
+          operations, do not be alarmed. That is expected because, by definition, all the I/O for ADLS tables involves
+          remote reads.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="adls__restrictions">
+
+    <h2 class="title topictitle2" id="ariaid-title12">Restrictions on Impala Support for ADLS</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Impala requires that the default filesystem for the cluster be HDFS. You cannot use ADLS as the only
+        filesystem in the cluster.
+      </p>
+
+      <p class="p">
+        Although ADLS is often used to store JSON-formatted data, the current Impala support for ADLS does not include
+        directly querying JSON data. For Impala queries, use data files in one of the file formats listed in
+        <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a>. If you have data in JSON format, you can prepare a
+        flattened version of that data for querying by Impala as part of your ETL cycle.
+      </p>
+
+      <p class="p">
+        You cannot use the <code class="ph codeph">ALTER TABLE ... SET CACHED</code> statement for tables or partitions that are
+        located in ADLS.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title13" id="adls__best_practices">
+    <h2 class="title topictitle2" id="ariaid-title13">Best Practices for Using Impala with ADLS</h2>
+
+    <div class="body conbody">
+      <p class="p">
+        The following guidelines represent best practices derived from testing and real-world experience with Impala on ADLS:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Any reference to an ADLS location must be fully qualified. (This rule applies when
+            ADLS is not designated as the default filesystem.)
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Set any appropriate configuration settings for <span class="keyword cmdname">impalad</span>.
+          </p>
+        </li>
+      </ul>
+
+    </div>
+  </article>
+
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_admin.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_admin.html b/docs/build3x/html/topics/impala_admin.html
new file mode 100644
index 0000000..7c76987
--- /dev/null
+++ b/docs/build3x/html/topics/impala_admin.html
@@ -0,0 +1,52 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admission.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_resource_management.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_timeouts.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_proxy.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disk_space.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="admin"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Admini
 stration</title></head><body id="admin"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Administration</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      As an administrator, you monitor Impala's use of resources and take action when necessary to keep Impala
+      running smoothly and avoid conflicts with other Hadoop components running on the same cluster. When you
+      detect that an issue has happened or could happen in the future, you reconfigure Impala or other components
+      such as HDFS or even the hardware of the cluster itself to resolve or avoid problems.
+    </p>
+
+    <p class="p toc"></p>
+
+    <p class="p">
+      <strong class="ph b">Related tasks:</strong>
+    </p>
+
+    <p class="p">
+      As an administrator, you can expect to perform installation, upgrade, and configuration tasks for Impala on
+      all machines in a cluster. See <a class="xref" href="impala_install.html#install">Installing Impala</a>,
+      <a class="xref" href="impala_upgrading.html#upgrading">Upgrading Impala</a>, and <a class="xref" href="impala_config.html#config">Managing Impala</a> for details.
+    </p>
+
+    <p class="p">
+      For security tasks typically performed by administrators, see <a class="xref" href="impala_security.html#security">Impala Security</a>.
+    </p>
+
+    <div class="p">
+      Administrators also decide how to allocate cluster resources so that all Hadoop components can run smoothly
+      together. For Impala, this task primarily involves:
+      <ul class="ul">
+        <li class="li">
+          Deciding how many Impala queries can run concurrently and with how much memory, through the admission
+          control feature. See <a class="xref" href="impala_admission.html#admission_control">Admission Control and Query Queuing</a> for details.
+        </li>
+
+        <li class="li">
+          Dividing cluster resources such as memory between Impala and other components, using YARN for overall
+          resource management, and Llama to mediate resource requests from Impala to YARN. See
+          <a class="xref" href="impala_resource_management.html#resource_management">Resource Management for Impala</a> for details.
+        </li>
+      </ul>
+    </div>
+
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_admission.html">Admission Control and Query Queuing</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_resource_management.html">Resource Management for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_timeouts.html">Setting Timeout Periods for Daemons, Queries, and Sessions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_proxy.html">Using Impala through a Proxy for High Availability</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disk_space.html">Managing Disk Space for Impala Data</a></strong><br></li></ul></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_admission.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_admission.html b/docs/build3x/html/topics/impala_admission.html
new file mode 100644
index 0000000..9eff7ea
--- /dev/null
+++ b/docs/build3x/html/topics/impala_admission.html
@@ -0,0 +1,822 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admin.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="admission_control"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Admission Control and Query Queuing</title></head><body id="admission_control"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Admission Control and Query Queuing</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p" id="admission_control__admission_control_intro">
+      Admission control is an Impala feature that imposes limits on concurrent SQL queries, to avoid resource usage
+      spikes and out-of-memory conditions on busy clusters.
+      It is a form of <span class="q">"throttling"</span>.
+      New queries are accepted and executed until
+      certain conditions are met, such as too many queries or too much
+      total memory used across the cluster.
+      When one of these thresholds is reached,
+      incoming queries wait to begin execution. These queries are
+      queued and are admitted (that is, begin executing) when the resources become available.
+    </p>
+    <p class="p">
+      In addition to the threshold values for currently executing queries,
+      you can place limits on the maximum number of queries that are
+      queued (waiting) and a limit on the amount of time they might wait
+      before returning with an error. These queue settings let you ensure that queries do
+      not wait indefinitely, so that you can detect and correct <span class="q">"starvation"</span> scenarios.
+    </p>
+    <p class="p">
+      Enable this feature if your cluster is
+      underutilized at some times and overutilized at others. Overutilization is indicated by performance
+      bottlenecks and queries being cancelled due to out-of-memory conditions, when those same queries are
+      successful and perform well during times with less concurrent load. Admission control works as a safeguard to
+      avoid out-of-memory conditions during heavy concurrent usage.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+          The use of the Llama component for integrated resource management within YARN
+          is no longer supported with <span class="keyword">Impala 2.3</span> and higher.
+          The Llama support code is removed entirely in <span class="keyword">Impala 2.8</span> and higher.
+        </p>
+        <p class="p">
+          For clusters running Impala alongside
+          other data management components, you define static service pools to define the resources
+          available to Impala and other components. Then within the area allocated for Impala,
+          you can create dynamic service pools, each with its own settings for the Impala admission control feature.
+        </p>
+      </div>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_admin.html">Impala Administration</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="admission_control__admission_intro">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Overview of Impala Admission Control</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        On a busy cluster, you might find there is an optimal number of Impala queries that run concurrently.
+        For example, when the I/O capacity is fully utilized by I/O-intensive queries,
+        you might not find any throughput benefit in running more concurrent queries.
+        By allowing some queries to run at full speed while others wait, rather than having
+        all queries contend for resources and run slowly, admission control can result in higher overall throughput.
+      </p>
+
+      <p class="p">
+        For another example, consider a memory-bound workload such as many large joins or aggregation queries.
+        Each such query could briefly use many gigabytes of memory to process intermediate results.
+        Because Impala by default cancels queries that exceed the specified memory limit,
+        running multiple large-scale queries at once might require
+        re-running some queries that are cancelled. In this case, admission control improves the
+        reliability and stability of the overall workload by only allowing as many concurrent queries
+        as the overall memory of the cluster can accomodate.
+      </p>
+
+      <p class="p">
+        The admission control feature lets you set an upper limit on the number of concurrent Impala
+        queries and on the memory used by those queries. Any additional queries are queued until the earlier ones
+        finish, rather than being cancelled or running slowly and causing contention. As other queries finish, the
+        queued queries are allowed to proceed.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.5</span> and higher, you can specify these limits and thresholds for each
+        pool rather than globally. That way, you can balance the resource usage and throughput
+        between steady well-defined workloads, rare resource-intensive queries, and ad hoc
+        exploratory queries.
+      </p>
+
+      <p class="p">
+        For details on the internal workings of admission control, see
+        <a class="xref" href="impala_admission.html#admission_architecture">How Impala Schedules and Enforces Limits on Concurrent Queries</a>.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="admission_control__admission_concurrency">
+    <h2 class="title topictitle2" id="ariaid-title3">Concurrent Queries and Admission Control</h2>
+    <div class="body conbody">
+      <p class="p">
+        One way to limit resource usage through admission control is to set an upper limit
+        on the number of concurrent queries. This is the initial technique you might use
+        when you do not have extensive information about memory usage for your workload.
+        This setting can be specified separately for each dynamic resource pool.
+      </p>
+      <p class="p">
+        You can combine this setting with the memory-based approach described in
+        <a class="xref" href="impala_admission.html#admission_memory">Memory Limits and Admission Control</a>. If either the maximum number of
+        or the expected memory usage of the concurrent queries is exceeded, subsequent queries
+        are queued until the concurrent workload falls below the threshold again.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="admission_control__admission_memory">
+    <h2 class="title topictitle2" id="ariaid-title4">Memory Limits and Admission Control</h2>
+    <div class="body conbody">
+      <p class="p">
+        Each dynamic resource pool can have an upper limit on the cluster-wide memory used by queries executing in that pool.
+        This is the technique to use once you have a stable workload with well-understood memory requirements.
+      </p>
+      <p class="p">
+        Always specify the <span class="ph uicontrol">Default Query Memory Limit</span> for the expected maximum amount of RAM
+        that a query might require on each host, which is equivalent to setting the <code class="ph codeph">MEM_LIMIT</code>
+        query option for every query run in that pool. That value affects the execution of each query, preventing it
+        from overallocating memory on each host, and potentially activating the spill-to-disk mechanism or cancelling
+        the query when necessary.
+      </p>
+      <p class="p">
+        Optionally, specify the <span class="ph uicontrol">Max Memory</span> setting, a cluster-wide limit that determines
+        how many queries can be safely run concurrently, based on the upper memory limit per host multiplied by the
+        number of Impala nodes in the cluster.
+      </p>
+      <div class="p">
+        For example, consider the following scenario:
+        <ul class="ul">
+          <li class="li"> The cluster is running <span class="keyword cmdname">impalad</span> daemons on five
+            DataNodes. </li>
+          <li class="li"> A dynamic resource pool has <span class="ph uicontrol">Max Memory</span> set
+            to 100 GB. </li>
+          <li class="li"> The <span class="ph uicontrol">Default Query Memory Limit</span> for the
+            pool is 10 GB. Therefore, any query running in this pool could use
+            up to 50 GB of memory (default query memory limit * number of Impala
+            nodes). </li>
+          <li class="li"> The maximum number of queries that Impala executes concurrently
+            within this dynamic resource pool is two, which is the most that
+            could be accomodated within the 100 GB <span class="ph uicontrol">Max
+              Memory</span> cluster-wide limit. </li>
+          <li class="li"> There is no memory penalty if queries use less memory than the
+              <span class="ph uicontrol">Default Query Memory Limit</span> per-host setting
+            or the <span class="ph uicontrol">Max Memory</span> cluster-wide limit. These
+            values are only used to estimate how many queries can be run
+            concurrently within the resource constraints for the pool. </li>
+        </ul>
+      </div>
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>  If you specify <span class="ph uicontrol">Max
+          Memory</span> for an Impala dynamic resource pool, you must also
+        specify the <span class="ph uicontrol">Default Query Memory Limit</span>.
+          <span class="ph uicontrol">Max Memory</span> relies on the <span class="ph uicontrol">Default
+          Query Memory Limit</span> to produce a reliable estimate of
+        overall memory consumption for a query. </div>
+      <p class="p">
+        You can combine the memory-based settings with the upper limit on concurrent queries described in
+        <a class="xref" href="impala_admission.html#admission_concurrency">Concurrent Queries and Admission Control</a>. If either the maximum number of
+        or the expected memory usage of the concurrent queries is exceeded, subsequent queries
+        are queued until the concurrent workload falls below the threshold again.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="admission_control__admission_yarn">
+
+    <h2 class="title topictitle2" id="ariaid-title5">How Impala Admission Control Relates to Other Resource Management Tools</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        The admission control feature is similar in some ways to the YARN resource management framework. These features
+        can be used separately or together. This section describes some similarities and differences, to help you
+        decide which combination of resource management features to use for Impala.
+      </p>
+
+      <p class="p">
+        Admission control is a lightweight, decentralized system that is suitable for workloads consisting
+        primarily of Impala queries and other SQL statements. It sets <span class="q">"soft"</span> limits that smooth out Impala
+        memory usage during times of heavy load, rather than taking an all-or-nothing approach that cancels jobs
+        that are too resource-intensive.
+      </p>
+
+      <p class="p">
+        Because the admission control system does not interact with other Hadoop workloads such as MapReduce jobs, you
+        might use YARN with static service pools on clusters where resources are shared between
+        Impala and other Hadoop components. This configuration is recommended when using Impala in a
+        <dfn class="term">multitenant</dfn> cluster. Devote a percentage of cluster resources to Impala, and allocate another
+        percentage for MapReduce and other batch-style workloads. Let admission control handle the concurrency and
+        memory usage for the Impala work within the cluster, and let YARN manage the work for other components within the
+        cluster. In this scenario, Impala's resources are not managed by YARN.
+      </p>
+
+      <p class="p">
+        The Impala admission control feature uses the same configuration mechanism as the YARN resource manager to map users to
+        pools and authenticate them.
+      </p>
+
+      <p class="p">
+        Although the Impala admission control feature uses a <code class="ph codeph">fair-scheduler.xml</code> configuration file
+        behind the scenes, this file does not depend on which scheduler is used for YARN. You still use this file
+        even when YARN is using the capacity scheduler.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="admission_control__admission_architecture">
+
+    <h2 class="title topictitle2" id="ariaid-title6">How Impala Schedules and Enforces Limits on Concurrent Queries</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        The admission control system is decentralized, embedded in each Impala daemon and communicating through the
+        statestore mechanism. Although the limits you set for memory usage and number of concurrent queries apply
+        cluster-wide, each Impala daemon makes its own decisions about whether to allow each query to run
+        immediately or to queue it for a less-busy time. These decisions are fast, meaning the admission control
+        mechanism is low-overhead, but might be imprecise during times of heavy load across many coordinators. There could be times when the
+        more queries were queued (in aggregate across the cluster) than the specified limit, or when number of admitted queries
+        exceeds the expected number. Thus, you typically err on the
+        high side for the size of the queue, because there is not a big penalty for having a large number of queued
+        queries; and you typically err on the low side for configuring memory resources, to leave some headroom in case more
+        queries are admitted than expected, without running out of memory and being cancelled as a result.
+      </p>
+
+
+
+      <p class="p">
+        To avoid a large backlog of queued requests, you can set an upper limit on the size of the queue for
+        queries that are queued. When the number of queued queries exceeds this limit, further queries are
+        cancelled rather than being queued. You can also configure a timeout period per pool, after which queued queries are
+        cancelled, to avoid indefinite waits. If a cluster reaches this state where queries are cancelled due to
+        too many concurrent requests or long waits for query execution to begin, that is a signal for an
+        administrator to take action, either by provisioning more resources, scheduling work on the cluster to
+        smooth out the load, or by doing <a class="xref" href="impala_performance.html#performance">Impala performance
+        tuning</a> to enable higher throughput.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="admission_control__admission_jdbc_odbc">
+
+    <h2 class="title topictitle2" id="ariaid-title7">How Admission Control works with Impala Clients (JDBC, ODBC, HiveServer2)</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        Most aspects of admission control work transparently with client interfaces such as JDBC and ODBC:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          If a SQL statement is put into a queue rather than running immediately, the API call blocks until the
+          statement is dequeued and begins execution. At that point, the client program can request to fetch
+          results, which might also block until results become available.
+        </li>
+
+        <li class="li">
+          If a SQL statement is cancelled because it has been queued for too long or because it exceeded the memory
+          limit during execution, the error is returned to the client program with a descriptive error message.
+        </li>
+
+      </ul>
+
+      <p class="p">
+        In Impala 2.0 and higher, you can submit
+        a SQL <code class="ph codeph">SET</code> statement from the client application
+        to change the <code class="ph codeph">REQUEST_POOL</code> query option.
+        This option lets you submit queries to different resource pools,
+        as described in <a class="xref" href="impala_request_pool.html#request_pool">REQUEST_POOL Query Option</a>.
+
+      </p>
+
+      <p class="p">
+        At any time, the set of queued queries could include queries submitted through multiple different Impala
+        daemon hosts. All the queries submitted through a particular host will be executed in order, so a
+        <code class="ph codeph">CREATE TABLE</code> followed by an <code class="ph codeph">INSERT</code> on the same table would succeed.
+        Queries submitted through different hosts are not guaranteed to be executed in the order they were
+        received. Therefore, if you are using load-balancing or other round-robin scheduling where different
+        statements are submitted through different hosts, set up all table structures ahead of time so that the
+        statements controlled by the queuing system are primarily queries, where order is not significant. Or, if a
+        sequence of statements needs to happen in strict order (such as an <code class="ph codeph">INSERT</code> followed by a
+        <code class="ph codeph">SELECT</code>), submit all those statements through a single session, while connected to the same
+        Impala daemon host.
+      </p>
+
+      <p class="p">
+        Admission control has the following limitations or special behavior when used with JDBC or ODBC
+        applications:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The other resource-related query options,
+          <code class="ph codeph">RESERVATION_REQUEST_TIMEOUT</code> and <code class="ph codeph">V_CPU_CORES</code>, are no longer used. Those query options only
+          applied to using Impala with Llama, which is no longer supported.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="admission_control__admission_schema_config">
+    <h2 class="title topictitle2" id="ariaid-title8">SQL and Schema Considerations for Admission Control</h2>
+    <div class="body conbody">
+      <p class="p">
+        When queries complete quickly and are tuned for optimal memory usage, there is less chance of
+        performance or capacity problems during times of heavy load. Before setting up admission control,
+        tune your Impala queries to ensure that the query plans are efficient and the memory estimates
+        are accurate. Understanding the nature of your workload, and which queries are the most
+        resource-intensive, helps you to plan how to divide the queries into different pools and
+        decide what limits to define for each pool.
+      </p>
+      <p class="p">
+        For large tables, especially those involved in join queries, keep their statistics up to date
+        after loading substantial amounts of new data or adding new partitions.
+        Use the <code class="ph codeph">COMPUTE STATS</code> statement for unpartitioned tables, and
+        <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> for partitioned tables.
+      </p>
+      <p class="p">
+        When you use dynamic resource pools with a <span class="ph uicontrol">Max Memory</span> setting enabled,
+        you typically override the memory estimates that Impala makes based on the statistics from the
+        <code class="ph codeph">COMPUTE STATS</code> statement.
+        You either set the <code class="ph codeph">MEM_LIMIT</code> query option within a particular session to
+        set an upper memory limit for queries within that session, or a default <code class="ph codeph">MEM_LIMIT</code>
+        setting for all queries processed by the <span class="keyword cmdname">impalad</span> instance, or
+        a default <code class="ph codeph">MEM_LIMIT</code> setting for all queries assigned to a particular
+        dynamic resource pool. By designating a consistent memory limit for a set of similar queries
+        that use the same resource pool, you avoid unnecessary query queuing or out-of-memory conditions
+        that can arise during high-concurrency workloads when memory estimates for some queries are inaccurate.
+      </p>
+      <p class="p">
+        Follow other steps from <a class="xref" href="impala_performance.html#performance">Tuning Impala for Performance</a> to tune your queries.
+      </p>
+    </div>
+  </article>
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="admission_control__admission_config">
+
+    <h2 class="title topictitle2" id="ariaid-title9">Configuring Admission Control</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        The configuration options for admission control range from the simple (a single resource pool with a single
+        set of options) to the complex (multiple resource pools with different options, each pool handling queries
+        for a different set of users and groups).
+      </p>
+
+      <section class="section" id="admission_config__admission_flags"><h3 class="title sectiontitle">Impala Service Flags for Admission Control (Advanced)</h3>
+
+
+
+        <p class="p">
+          The following Impala configuration options let you adjust the settings of the admission control feature. When supplying the
+          options on the <span class="keyword cmdname">impalad</span> command line, prepend the option name with <code class="ph codeph">--</code>.
+        </p>
+
+        <dl class="dl" id="admission_config__admission_control_option_list">
+
+            <dt class="dt dlterm" id="admission_config__queue_wait_timeout_ms">
+              <code class="ph codeph">queue_wait_timeout_ms</code>
+            </dt>
+            <dd class="dd">
+
+              <strong class="ph b">Purpose:</strong> Maximum amount of time (in milliseconds) that a
+              request waits to be admitted before timing out.
+              <p class="p">
+                <strong class="ph b">Type:</strong> <code class="ph codeph">int64</code>
+              </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong> <code class="ph codeph">60000</code>
+              </p>
+            </dd>
+
+
+            <dt class="dt dlterm" id="admission_config__default_pool_max_requests">
+              <code class="ph codeph">default_pool_max_requests</code>
+            </dt>
+            <dd class="dd">
+
+              <strong class="ph b">Purpose:</strong> Maximum number of concurrent outstanding requests
+              allowed to run before incoming requests are queued. Because this
+              limit applies cluster-wide, but each Impala node makes independent
+              decisions to run queries immediately or queue them, it is a soft
+              limit; the overall number of concurrent queries might be slightly
+              higher during times of heavy load. A negative value indicates no
+              limit. Ignored if <code class="ph codeph">fair_scheduler_config_path</code> and
+                <code class="ph codeph">llama_site_path</code> are set. <p class="p">
+                <strong class="ph b">Type:</strong>
+                <code class="ph codeph">int64</code>
+              </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong>
+                <span class="ph">-1, meaning unlimited (prior to <span class="keyword">Impala 2.5</span> the default was 200)</span>
+              </p>
+            </dd>
+
+
+            <dt class="dt dlterm" id="admission_config__default_pool_max_queued">
+              <code class="ph codeph">default_pool_max_queued</code>
+            </dt>
+            <dd class="dd">
+
+              <strong class="ph b">Purpose:</strong> Maximum number of requests allowed to be queued
+              before rejecting requests. Because this limit applies
+              cluster-wide, but each Impala node makes independent decisions to
+              run queries immediately or queue them, it is a soft limit; the
+              overall number of queued queries might be slightly higher during
+              times of heavy load. A negative value or 0 indicates requests are
+              always rejected once the maximum concurrent requests are
+              executing. Ignored if <code class="ph codeph">fair_scheduler_config_path</code>
+              and <code class="ph codeph">llama_site_path</code> are set. <p class="p">
+                <strong class="ph b">Type:</strong>
+                <code class="ph codeph">int64</code>
+              </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong>
+                <span class="ph">unlimited</span>
+              </p>
+            </dd>
+
+
+            <dt class="dt dlterm" id="admission_config__default_pool_mem_limit">
+              <code class="ph codeph">default_pool_mem_limit</code>
+            </dt>
+            <dd class="dd">
+
+              <strong class="ph b">Purpose:</strong> Maximum amount of memory (across the entire
+              cluster) that all outstanding requests in this pool can use before
+              new requests to this pool are queued. Specified in bytes,
+              megabytes, or gigabytes by a number followed by the suffix
+                <code class="ph codeph">b</code> (optional), <code class="ph codeph">m</code>, or
+                <code class="ph codeph">g</code>, either uppercase or lowercase. You can
+              specify floating-point values for megabytes and gigabytes, to
+              represent fractional numbers such as <code class="ph codeph">1.5</code>. You can
+              also specify it as a percentage of the physical memory by
+              specifying the suffix <code class="ph codeph">%</code>. 0 or no setting
+              indicates no limit. Defaults to bytes if no unit is given. Because
+              this limit applies cluster-wide, but each Impala node makes
+              independent decisions to run queries immediately or queue them, it
+              is a soft limit; the overall memory used by concurrent queries
+              might be slightly higher during times of heavy load. Ignored if
+                <code class="ph codeph">fair_scheduler_config_path</code> and
+                <code class="ph codeph">llama_site_path</code> are set. <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        Impala relies on the statistics produced by the <code class="ph codeph">COMPUTE STATS</code> statement to estimate memory
+        usage for each query. See <a class="xref" href="../shared/../topics/impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> for guidelines
+        about how and when to use this statement.
+      </div>
+              <p class="p">
+        <strong class="ph b">Type:</strong> string
+      </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong>
+                <code class="ph codeph">""</code> (empty string, meaning unlimited) </p>
+            </dd>
+
+
+            <dt class="dt dlterm" id="admission_config__disable_pool_max_requests">
+              <code class="ph codeph">disable_pool_max_requests</code>
+            </dt>
+            <dd class="dd">
+
+              <strong class="ph b">Purpose:</strong> Disables all per-pool limits on the maximum number
+              of running requests. <p class="p">
+                <strong class="ph b">Type:</strong> Boolean </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong>
+                <code class="ph codeph">false</code>
+              </p>
+            </dd>
+
+
+            <dt class="dt dlterm" id="admission_config__disable_pool_mem_limits">
+              <code class="ph codeph">disable_pool_mem_limits</code>
+            </dt>
+            <dd class="dd">
+
+              <strong class="ph b">Purpose:</strong> Disables all per-pool mem limits. <p class="p">
+                <strong class="ph b">Type:</strong> Boolean </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong>
+                <code class="ph codeph">false</code>
+              </p>
+            </dd>
+
+
+            <dt class="dt dlterm" id="admission_config__fair_scheduler_allocation_path">
+              <code class="ph codeph">fair_scheduler_allocation_path</code>
+            </dt>
+            <dd class="dd">
+
+              <strong class="ph b">Purpose:</strong> Path to the fair scheduler allocation file
+                (<code class="ph codeph">fair-scheduler.xml</code>). <p class="p">
+        <strong class="ph b">Type:</strong> string
+      </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong>
+                <code class="ph codeph">""</code> (empty string) </p>
+              <p class="p">
+                <strong class="ph b">Usage notes:</strong> Admission control only uses a small subset
+                of the settings that can go in this file, as described below.
+                For details about all the Fair Scheduler configuration settings,
+                see the <a class="xref" href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Configuration" target="_blank">Apache wiki</a>. </p>
+            </dd>
+
+
+            <dt class="dt dlterm" id="admission_config__llama_site_path">
+              <code class="ph codeph">llama_site_path</code>
+            </dt>
+            <dd class="dd">
+
+              <strong class="ph b">Purpose:</strong> Path to the configuration file used by admission control
+                (<code class="ph codeph">llama-site.xml</code>). If set,
+                <code class="ph codeph">fair_scheduler_allocation_path</code> must also be set.
+              <p class="p">
+        <strong class="ph b">Type:</strong> string
+      </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong> <code class="ph codeph">""</code> (empty string) </p>
+              <p class="p">
+                <strong class="ph b">Usage notes:</strong> Admission control only uses a few
+                of the settings that can go in this file, as described below.
+              </p>
+            </dd>
+
+        </dl>
+      </section>
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="admission_config__admission_config_manual">
+
+      <h3 class="title topictitle3" id="ariaid-title10">Configuring Admission Control Using the Command Line</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          To configure admission control, use a combination of startup options for the Impala daemon and edit
+          or create the configuration files <span class="ph filepath">fair-scheduler.xml</span> and
+            <span class="ph filepath">llama-site.xml</span>.
+        </p>
+
+        <p class="p">
+          For a straightforward configuration using a single resource pool named <code class="ph codeph">default</code>, you can
+          specify configuration options on the command line and skip the <span class="ph filepath">fair-scheduler.xml</span>
+          and <span class="ph filepath">llama-site.xml</span> configuration files.
+        </p>
+
+        <p class="p">
+          For an advanced configuration with multiple resource pools using different settings, set up the
+          <span class="ph filepath">fair-scheduler.xml</span> and <span class="ph filepath">llama-site.xml</span> configuration files
+          manually. Provide the paths to each one using the <span class="keyword cmdname">impalad</span> command-line options,
+          <code class="ph codeph">--fair_scheduler_allocation_path</code> and <code class="ph codeph">--llama_site_path</code> respectively.
+        </p>
+
+        <p class="p">
+          The Impala admission control feature only uses the Fair Scheduler configuration settings to determine how
+          to map users and groups to different resource pools. For example, you might set up different resource
+          pools with separate memory limits, and maximum number of concurrent and queued queries, for different
+          categories of users within your organization. For details about all the Fair Scheduler configuration
+          settings, see the
+          <a class="xref" href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Configuration" target="_blank">Apache
+          wiki</a>.
+        </p>
+
+        <p class="p">
+          The Impala admission control feature only uses a small subset of possible settings from the
+          <span class="ph filepath">llama-site.xml</span> configuration file:
+        </p>
+
+<pre class="pre codeblock"><code>llama.am.throttling.maximum.placed.reservations.<var class="keyword varname">queue_name</var>
+llama.am.throttling.maximum.queued.reservations.<var class="keyword varname">queue_name</var>
+<span class="ph">impala.admission-control.pool-default-query-options.<var class="keyword varname">queue_name</var>
+impala.admission-control.pool-queue-timeout-ms.<var class="keyword varname">queue_name</var></span>
+</code></pre>
+
+        <p class="p">
+          The <code class="ph codeph">impala.admission-control.pool-queue-timeout-ms</code>
+          setting specifies the timeout value for this pool, in milliseconds.
+          The<code class="ph codeph">impala.admission-control.pool-default-query-options</code>
+          settings designates the default query options for all queries that run
+          in this pool. Its argument value is a comma-delimited string of
+          'key=value' pairs, for example,<code class="ph codeph">'key1=val1,key2=val2'</code>.
+          For example, this is where you might set a default memory limit
+          for all queries in the pool, using an argument such as <code class="ph codeph">MEM_LIMIT=5G</code>.
+        </p>
+
+        <p class="p">
+          The <code class="ph codeph">impala.admission-control.*</code> configuration settings are available in
+          <span class="keyword">Impala 2.5</span> and higher.
+        </p>
+
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title11" id="admission_config__admission_examples">
+
+      <h3 class="title topictitle3" id="ariaid-title11">Example of Admission Control Configuration</h3>
+
+      <div class="body conbody">
+
+        <p class="p"> Here are sample <span class="ph filepath">fair-scheduler.xml</span> and
+          <span class="ph filepath">llama-site.xml</span> files that define resource pools
+          <code class="ph codeph">root.default</code>, <code class="ph codeph">root.development</code>, and
+          <code class="ph codeph">root.production</code>. These sample files are stripped down: in a real
+          deployment they might contain other settings for use with various aspects of the YARN
+          component. The settings shown here are the significant ones for the Impala admission
+          control feature. </p>
+
+        <p class="p">
+          <strong class="ph b">fair-scheduler.xml:</strong>
+        </p>
+
+        <p class="p">
+          Although Impala does not use the <code class="ph codeph">vcores</code> value, you must still specify it to satisfy
+          YARN requirements for the file contents.
+        </p>
+
+        <p class="p">
+          Each <code class="ph codeph">&lt;aclSubmitApps&gt;</code> tag (other than the one for <code class="ph codeph">root</code>) contains
+          a comma-separated list of users, then a space, then a comma-separated list of groups; these are the
+          users and groups allowed to submit Impala statements to the corresponding resource pool.
+        </p>
+
+        <p class="p">
+          If you leave the <code class="ph codeph">&lt;aclSubmitApps&gt;</code> element empty for a pool, nobody can submit
+          directly to that pool; child pools can specify their own <code class="ph codeph">&lt;aclSubmitApps&gt;</code> values
+          to authorize users and groups to submit to those pools.
+        </p>
+
+        <pre class="pre codeblock"><code>&lt;allocations&gt;
+
+    &lt;queue name="root"&gt;
+        &lt;aclSubmitApps&gt; &lt;/aclSubmitApps&gt;
+        &lt;queue name="default"&gt;
+            &lt;maxResources&gt;50000 mb, 0 vcores&lt;/maxResources&gt;
+            &lt;aclSubmitApps&gt;*&lt;/aclSubmitApps&gt;
+        &lt;/queue&gt;
+        &lt;queue name="development"&gt;
+            &lt;maxResources&gt;200000 mb, 0 vcores&lt;/maxResources&gt;
+            &lt;aclSubmitApps&gt;user1,user2 dev,ops,admin&lt;/aclSubmitApps&gt;
+        &lt;/queue&gt;
+        &lt;queue name="production"&gt;
+            &lt;maxResources&gt;1000000 mb, 0 vcores&lt;/maxResources&gt;
+            &lt;aclSubmitApps&gt; ops,admin&lt;/aclSubmitApps&gt;
+        &lt;/queue&gt;
+    &lt;/queue&gt;
+    &lt;queuePlacementPolicy&gt;
+        &lt;rule name="specified" create="false"/&gt;
+        &lt;rule name="default" /&gt;
+    &lt;/queuePlacementPolicy&gt;
+&lt;/allocations&gt;
+
+</code></pre>
+
+        <p class="p">
+          <strong class="ph b">llama-site.xml:</strong>
+        </p>
+
+        <pre class="pre codeblock"><code>
+&lt;?xml version="1.0" encoding="UTF-8"?&gt;
+&lt;configuration&gt;
+  &lt;property&gt;
+    &lt;name&gt;llama.am.throttling.maximum.placed.reservations.root.default&lt;/name&gt;
+    &lt;value&gt;10&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;llama.am.throttling.maximum.queued.reservations.root.default&lt;/name&gt;
+    &lt;value&gt;50&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;impala.admission-control.pool-default-query-options.root.default&lt;/name&gt;
+    &lt;value&gt;mem_limit=128m,query_timeout_s=20,max_io_buffers=10&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;impala.admission-control.pool-queue-timeout-ms.root.default&lt;/name&gt;
+    &lt;value&gt;30000&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;llama.am.throttling.maximum.placed.reservations.root.development&lt;/name&gt;
+    &lt;value&gt;50&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;llama.am.throttling.maximum.queued.reservations.root.development&lt;/name&gt;
+    &lt;value&gt;100&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;impala.admission-control.pool-default-query-options.root.development&lt;/name&gt;
+    &lt;value&gt;mem_limit=256m,query_timeout_s=30,max_io_buffers=10&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;impala.admission-control.pool-queue-timeout-ms.root.development&lt;/name&gt;
+    &lt;value&gt;15000&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;llama.am.throttling.maximum.placed.reservations.root.production&lt;/name&gt;
+    &lt;value&gt;100&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;llama.am.throttling.maximum.queued.reservations.root.production&lt;/name&gt;
+    &lt;value&gt;200&lt;/value&gt;
+  &lt;/property&gt;
+&lt;!--
+       Default query options for the 'root.production' pool.
+       THIS IS A NEW PARAMETER in Impala 2.5.
+       Note that the MEM_LIMIT query option still shows up in here even though it is a
+       separate box in the UI. We do that because it is the most important query option
+       that people will need (everything else is somewhat advanced).
+
+       MEM_LIMIT takes a per-node memory limit which is specified using one of the following:
+        - '&lt;int&gt;[bB]?'  -&gt; bytes (default if no unit given)
+        - '&lt;float&gt;[mM(bB)]' -&gt; megabytes
+        - '&lt;float&gt;[gG(bB)]' -&gt; in gigabytes
+        E.g. 'MEM_LIMIT=12345' (no unit) means 12345 bytes, and you can append m or g
+             to specify megabytes or gigabytes, though that is not required.
+--&gt;
+  &lt;property&gt;
+    &lt;name&gt;impala.admission-control.pool-default-query-options.root.production&lt;/name&gt;
+    &lt;value&gt;mem_limit=386m,query_timeout_s=30,max_io_buffers=10&lt;/value&gt;
+  &lt;/property&gt;
+&lt;!--
+  Default queue timeout (ms) for the pool 'root.production'.
+  If this isn’t set, the process-wide flag is used.
+  THIS IS A NEW PARAMETER in Impala 2.5.
+--&gt;
+  &lt;property&gt;
+    &lt;name&gt;impala.admission-control.pool-queue-timeout-ms.root.production&lt;/name&gt;
+    &lt;value&gt;30000&lt;/value&gt;
+  &lt;/property&gt;
+&lt;/configuration&gt;
+
+</code></pre>
+
+      </div>
+    </article>
+
+
+
+  <article class="topic concept nested2" aria-labelledby="ariaid-title12" id="admission_config__admission_guidelines">
+
+    <h3 class="title topictitle3" id="ariaid-title12">Guidelines for Using Admission Control</h3>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        To see how admission control works for particular queries, examine the profile output for the query. This
+        information is available through the <code class="ph codeph">PROFILE</code> statement in <span class="keyword cmdname">impala-shell</span>
+        immediately after running a query in the shell, on the <span class="ph uicontrol">queries</span> page of the Impala
+        debug web UI, or in the Impala log file (basic information at log level 1, more detailed information at log
+        level 2). The profile output contains details about the admission decision, such as whether the query was
+        queued or not and which resource pool it was assigned to. It also includes the estimated and actual memory
+        usage for the query, so you can fine-tune the configuration for the memory limits of the resource pools.
+      </p>
+
+      <p class="p">
+        Remember that the limits imposed by admission control are <span class="q">"soft"</span> limits.
+        The decentralized nature of this mechanism means that each Impala node makes its own decisions about whether
+        to allow queries to run immediately or to queue them. These decisions rely on information passed back and forth
+        between nodes by the statestore service. If a sudden surge in requests causes more queries than anticipated to run
+        concurrently, then throughput could decrease due to queries spilling to disk or contending for resources;
+        or queries could be cancelled if they exceed the <code class="ph codeph">MEM_LIMIT</code> setting while running.
+      </p>
+
+
+
+      <p class="p">
+        In <span class="keyword cmdname">impala-shell</span>, you can also specify which resource pool to direct queries to by
+        setting the <code class="ph codeph">REQUEST_POOL</code> query option.
+      </p>
+
+      <p class="p">
+        The statements affected by the admission control feature are primarily queries, but also include statements
+        that write data such as <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code>. Most write
+        operations in Impala are not resource-intensive, but inserting into a Parquet table can require substantial
+        memory due to buffering intermediate data before writing out each Parquet data block. See
+        <a class="xref" href="impala_parquet.html#parquet_etl">Loading Data into Parquet Tables</a> for instructions about inserting data efficiently into
+        Parquet tables.
+      </p>
+
+      <p class="p">
+        Although admission control does not scrutinize memory usage for other kinds of DDL statements, if a query
+        is queued due to a limit on concurrent queries or memory usage, subsequent statements in the same session
+        are also queued so that they are processed in the correct order:
+      </p>
+
+<pre class="pre codeblock"><code>-- This query could be queued to avoid out-of-memory at times of heavy load.
+select * from huge_table join enormous_table using (id);
+-- If so, this subsequent statement in the same session is also queued
+-- until the previous statement completes.
+drop table huge_table;
+</code></pre>
+
+      <p class="p">
+        If you set up different resource pools for different users and groups, consider reusing any classifications
+        you developed for use with Sentry security. See <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for details.
+      </p>
+
+      <p class="p">
+        For details about all the Fair Scheduler configuration settings, see
+        <a class="xref" href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Configuration" target="_blank">Fair Scheduler Configuration</a>, in particular the tags such as <code class="ph codeph">&lt;queue&gt;</code> and
+        <code class="ph codeph">&lt;aclSubmitApps&gt;</code> to map users and groups to particular resource pools (queues).
+      </p>
+
+
+    </div>
+  </article>
+</article>
+</article></main></body></html>