You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by jr...@apache.org on 2017/11/13 23:26:52 UTC
[2/7] incubator-impala git commit: Update Impala docs for 2.10 release

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ca9005be/docs/build/html/topics/impala_scalability.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_scalability.html b/docs/build/html/topics/impala_scalability.html
index 41e1d10..e361005 100644
--- a/docs/build/html/topics/impala_scalability.html
+++ b/docs/build/html/topics/impala_scalability.html
@@ -1,6 +1,6 @@
 <!DOCTYPE html
   SYSTEM "about:legacy-compat">
-<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x">
 <meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="scalability"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Scalability Considerations for Impala</title></head><body id="scalability"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name
 ="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="scalability"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Scalability Considerations for Impala</title></head><body id="scalability"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
 
   <h1 class="title topictitle1" id="ariaid-title1">Scalability Considerations for Impala</h1>
   
@@ -319,13 +319,45 @@
     </div>
   </article>
 
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="scalability__scalability_buffer_pool">
+    <h2 class="title topictitle2" id="ariaid-title5">Effect of Buffer Pool on Memory Usage (<span class="keyword">Impala 2.10</span> and higher)</h2>
+    <div class="body conbody">
+      <p class="p">
+        The buffer pool feature, available in <span class="keyword">Impala 2.10</span> and higher, changes the
+        way Impala allocates memory during a query. Most of the memory needed is reserved at the
+        beginning of the query, avoiding cases where a query might run for a long time before failing
+        with an out-of-memory error. The actual memory estimates and memory buffers are typically
+        smaller than before, so that more queries can run concurrently or process larger volumes
+        of data than previously.
+      </p>
+      <p class="p">
+        The buffer pool feature includes some query options that you can fine-tune:
+        <a class="xref" href="impala_buffer_pool_limit.html">BUFFER_POOL_LIMIT Query Option</a>,
+        <a class="xref" href="impala_default_spillable_buffer_size.html">DEFAULT_SPILLABLE_BUFFER_SIZE Query Option</a>,
+        <a class="xref" href="impala_max_row_size.html">MAX_ROW_SIZE Query Option</a>, and
+        <a class="xref" href="impala_min_spillable_buffer_size.html">MIN_SPILLABLE_BUFFER_SIZE Query Option</a>.
+      </p>
+      <p class="p">
+        Most of the effects of the buffer pool are transparent to you as an Impala user.
+        Memory use during spilling is now steadier and more predictable, instead of
+        increasing rapidly as more data is spilled to disk. The main change from a user
+        perspective is the need to increase the <code class="ph codeph">MAX_ROW_SIZE</code> query option
+        setting when querying tables with columns containing long strings, many columns,
+        or other combinations of factors that produce very large rows. If Impala encounters
+        rows that are too large to process with the default query option settings, the query
+        fails with an error message suggesting to increase the <code class="ph codeph">MAX_ROW_SIZE</code>
+        setting.
+      </p>
+    </div>
+  </article>
+
   
 
   
 
-  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="scalability__spill_to_disk">
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="scalability__spill_to_disk">
 
-    <h2 class="title topictitle2" id="ariaid-title5">SQL Operations that Spill to Disk</h2>
+    <h2 class="title topictitle2" id="ariaid-title6">SQL Operations that Spill to Disk</h2>
 
     <div class="body conbody">
 
@@ -341,6 +373,14 @@
         you should optimize your queries, system parameters, and hardware configuration to make this spilling a rare occurrence.
       </p>
 
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+          In <span class="keyword">Impala 2.10</span> and higher, also see <a class="xref" href="impala_scalability.html">Scalability Considerations for Impala</a> for
+          changes to Impala memory allocation that might change the details of which queries spill to disk,
+          and how much memory and disk space is involved in the spilling operation.
+        </p>
+      </div>
+
       <p class="p">
         <strong class="ph b">What kinds of queries might spill to disk:</strong>
       </p>
@@ -412,42 +452,40 @@
       </p>
 
       <p class="p">
-        The infrastructure of the spilling feature affects the way the affected SQL operators, such as
-        <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">DISTINCT</code>, and joins, use memory.
-        On each host that participates in the query, each such operator in a query accumulates memory
-        while building the data structure to process the aggregation or join operation. The amount
-        of memory used depends on the portion of the data being handled by that host, and thus might
-        be different from one host to another. When the amount of memory being used for the operator
-        on a particular host reaches a threshold amount, Impala reserves an additional memory buffer
-        to use as a work area in case that operator causes the query to exceed the memory limit for
-        that host. After allocating the memory buffer, the memory used by that operator remains
-        essentially stable or grows only slowly, until the point where the memory limit is reached
-        and the query begins writing temporary data to disk.
+        In <span class="keyword">Impala 2.10</span> and higher, the way SQL operators such as <code class="ph codeph">GROUP BY</code>,
+        <code class="ph codeph">DISTINCT</code>, and joins, transition between using additional memory or activating the
+        spill-to-disk feature is changed. The memory required to spill to disk is reserved up front, and you can
+        examine it in the <code class="ph codeph">EXPLAIN</code> plan when the <code class="ph codeph">EXPLAIN_LEVEL</code> query option is
+        set to 2 or higher.
       </p>
 
-      <p class="p">
-        Prior to Impala 2.2, the extra memory buffer for an operator that might spill to disk
-        was allocated when the data structure used by the applicable SQL operator reaches 16 MB in size,
-        and the memory buffer itself was 512 MB. In Impala 2.2, these values are halved: the threshold value
-        is 8 MB and the memory buffer is 256 MB. <span class="ph">In <span class="keyword">Impala 2.3</span> and higher, the memory for the buffer
-        is allocated in pieces, only as needed, to avoid sudden large jumps in memory usage.</span> A query that uses
-        multiple such operators might allocate multiple such memory buffers, as the size of the data structure
-        for each operator crosses the threshold on a particular host.
-      </p>
+     <p class="p">
+       The infrastructure of the spilling feature affects the way the affected SQL operators, such as
+       <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">DISTINCT</code>, and joins, use memory.
+       On each host that participates in the query, each such operator in a query requires memory
+       to store rows of data and other data structures. Impala reserves a certain amount of memory
+       up front for each operator that supports spill-to-disk that is sufficient to execute the
+       operator. If an operator accumulates more data than can fit in the reserved memory, it
+       can either reserve more memory to continue processing data in memory or start spilling
+       data to temporary scratch files on disk. Thus, operators with spill-to-disk support
+       can adapt to different memory constraints by using however much memory is available
+       to speed up execution, yet tolerate low memory conditions by spilling data to disk.
+     </p>
+     
+     <p class="p">
+       The amount data depends on the portion of the data being handled by that host, and thus
+       the operator may end up consuming different amounts of memory on different hosts.
+     </p>
+
 
-      <p class="p">
-        Therefore, a query that processes a relatively small amount of data on each host would likely
-        never reach the threshold for any operator, and would never allocate any extra memory buffers. A query
-        that did process millions of groups, distinct values, join keys, and so on might cross the threshold,
-        causing its memory requirement to rise suddenly and then flatten out. The larger the cluster, less data is processed
-        on any particular host, thus reducing the chance of requiring the extra memory allocation.
-      </p>
 
       <p class="p">
         <strong class="ph b">Added in:</strong> This feature was added to the <code class="ph codeph">ORDER BY</code> clause in Impala 1.4.
         This feature was extended to cover join queries, aggregation functions, and analytic
         functions in Impala 2.0. The size of the memory work area required by
         each operator that spills was reduced from 512 megabytes to 256 megabytes in Impala 2.2.
+        <span class="ph">The spilling mechanism was reworked to take advantage of the
+        Impala buffer pool feature and be more predictable and stable in <span class="keyword">Impala 2.10</span>.</span>
       </p>
 
       <p class="p">
@@ -467,8 +505,10 @@
             <li class="li">
               The output of the <code class="ph codeph">PROFILE</code> command in the <span class="keyword cmdname">impala-shell</span>
               interpreter. This data shows the memory usage for each host and in total across the cluster. The
-              <code class="ph codeph">BlockMgr.BytesWritten</code> counter reports how much data was written to disk during the
-              query.
+              <code class="ph codeph">WriteIoBytes</code> counter reports how much data was written to disk for each operator
+              during the query. (In <span class="keyword">Impala 2.9</span>, the counter was named
+              <code class="ph codeph">ScratchBytesWritten</code>; in <span class="keyword">Impala 2.8</span> and earlier, it was named
+              <code class="ph codeph">BytesWritten</code>.)
             </li>
 
             <li class="li">
@@ -571,34 +611,15 @@
 
       <p class="p">
         Issue the <code class="ph codeph">PROFILE</code> command to get a detailed breakdown of the memory usage on each node
-        during the query. The crucial part of the profile output concerning memory is the <code class="ph codeph">BlockMgr</code>
-        portion. For example, this profile shows that the query did not quite exceed the memory limit.
-      </p>
-
-<pre class="pre codeblock"><code>BlockMgr:
-   - BlockWritesIssued: 1
-   - BlockWritesOutstanding: 0
-   - BlocksCreated: 24
-   - BlocksRecycled: 1
-   - BufferedPins: 0
-   - MaxBlockSize: 8.00 MB (8388608)
-   <strong class="ph b">- MemoryLimit: 200.00 MB (209715200)</strong>
-   <strong class="ph b">- PeakMemoryUsage: 192.22 MB (201555968)</strong>
-   - TotalBufferWaitTime: 0ns
-   - TotalEncryptionTime: 0ns
-   - TotalIntegrityCheckTime: 0ns
-   - TotalReadBlockTime: 0ns
-</code></pre>
-
-      <p class="p">
-        In this case, because the memory limit was already below any recommended value, I increased the volume of
-        data for the query rather than reducing the memory limit any further.
+        during the query.
+        
       </p>
 
+
+
       <p class="p">
         Set the <code class="ph codeph">MEM_LIMIT</code> query option to a value that is smaller than the peak memory usage
-        reported in the profile output. Do not specify a memory limit lower than about 300 MB, because with such a
-        low limit, queries could fail to start for other reasons. Now try the memory-intensive query again.
+        reported in the profile output. Now try the memory-intensive query again.
       </p>
 
       <p class="p">
@@ -687,8 +708,8 @@ these tables, hint the plan or disable this behavior via query options to enable
     </div>
   </article>
 
-<article class="topic concept nested1" aria-labelledby="ariaid-title6" id="scalability__complex_query">
-<h2 class="title topictitle2" id="ariaid-title6">Limits on Query Size and Complexity</h2>
+<article class="topic concept nested1" aria-labelledby="ariaid-title7" id="scalability__complex_query">
+<h2 class="title topictitle2" id="ariaid-title7">Limits on Query Size and Complexity</h2>
 <div class="body conbody">
 <p class="p">
 There are hardcoded limits on the maximum size and complexity of queries.
@@ -712,8 +733,8 @@ use a single <code class="ph codeph">IN</code> clause:
 </div>
 </article>
 
-<article class="topic concept nested1" aria-labelledby="ariaid-title7" id="scalability__scalability_io">
-<h2 class="title topictitle2" id="ariaid-title7">Scalability Considerations for Impala I/O</h2>
+<article class="topic concept nested1" aria-labelledby="ariaid-title8" id="scalability__scalability_io">
+<h2 class="title topictitle2" id="ariaid-title8">Scalability Considerations for Impala I/O</h2>
 <div class="body conbody">
 <p class="p">
 Impala parallelizes its I/O operations aggressively,
@@ -738,26 +759,34 @@ Currently, there is no throttling mechanism for Impala I/O.
 </div>
 </article>
 
-<article class="topic concept nested1" aria-labelledby="ariaid-title8" id="scalability__big_tables">
-<h2 class="title topictitle2" id="ariaid-title8">Scalability Considerations for Table Layout</h2>
-<div class="body conbody">
-<p class="p">
-Due to the overhead of retrieving and updating table metadata
-in the metastore database, try to limit the number of columns
-in a table to a maximum of approximately 2000.
-Although Impala can handle wider tables than this, the metastore overhead
-can become significant, leading to query performance that is slower
-than expected based on the actual data volume.
-</p>
-<p class="p">
-To minimize overhead related to the metastore database and Impala query planning,
-try to limit the number of partitions for any partitioned table to a few tens of thousands.
-</p>
-</div>
-</article>
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="scalability__big_tables">
+    <h2 class="title topictitle2" id="ariaid-title9">Scalability Considerations for Table Layout</h2>
+    <div class="body conbody">
+      <p class="p">
+        Due to the overhead of retrieving and updating table metadata
+        in the metastore database, try to limit the number of columns
+        in a table to a maximum of approximately 2000.
+        Although Impala can handle wider tables than this, the metastore overhead
+        can become significant, leading to query performance that is slower
+        than expected based on the actual data volume.
+      </p>
+      <p class="p">
+        To minimize overhead related to the metastore database and Impala query planning,
+        try to limit the number of partitions for any partitioned table to a few tens of thousands.
+      </p>
+      <p class="p">
+        If the volume of data within a table makes it impractical to run exploratory
+        queries, consider using the <code class="ph codeph">TABLESAMPLE</code> clause to limit query processing
+        to only a percentage of data within the table. This technique reduces the overhead
+        for query startup, I/O to read the data, and the amount of network, CPU, and memory
+        needed to process intermediate results during the query. See <a class="xref" href="impala_tablesample.html">TABLESAMPLE Clause</a>
+        for details.
+      </p>
+    </div>
+  </article>
 
-<article class="topic concept nested1" aria-labelledby="ariaid-title9" id="scalability__kerberos_overhead_cluster_size">
-<h2 class="title topictitle2" id="ariaid-title9">Kerberos-Related Network Overhead for Large Clusters</h2>
+<article class="topic concept nested1" aria-labelledby="ariaid-title10" id="scalability__kerberos_overhead_cluster_size">
+<h2 class="title topictitle2" id="ariaid-title10">Kerberos-Related Network Overhead for Large Clusters</h2>
 <div class="body conbody">
 <p class="p">
 When Impala starts up, or after each <code class="ph codeph">kinit</code> refresh, Impala sends a number of
@@ -782,8 +811,8 @@ so other secure services might be affected temporarily.
 </div>
 </article>
 
-  <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="scalability__kerberos_overhead_memory_usage">
-  <h2 class="title topictitle2" id="ariaid-title10">Kerberos-Related Memory Overhead for Large Clusters</h2>
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="scalability__kerberos_overhead_memory_usage">
+  <h2 class="title topictitle2" id="ariaid-title11">Kerberos-Related Memory Overhead for Large Clusters</h2>
   <div class="body conbody">
     <div class="p">
         On a kerberized cluster with high memory utilization, <span class="keyword cmdname">kinit</span> commands executed after
@@ -817,8 +846,8 @@ vm.overcommit_memory=1
   </div>
   </article>
 
-  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="scalability__scalability_hotspots">
-    <h2 class="title topictitle2" id="ariaid-title11">Avoiding CPU Hotspots for HDFS Cached Data</h2>
+  <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="scalability__scalability_hotspots">
+    <h2 class="title topictitle2" id="ariaid-title12">Avoiding CPU Hotspots for HDFS Cached Data</h2>
     <div class="body conbody">
       <p class="p">
         You can use the HDFS caching feature, described in <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a>,
@@ -855,4 +884,68 @@ vm.overcommit_memory=1
     </div>
   </article>
 
+  <article class="topic concept nested1" aria-labelledby="ariaid-title13" id="scalability__scalability_file_handle_cache">
+    <h2 class="title topictitle2" id="ariaid-title13">Scalability Considerations for NameNode Traffic with File Handle Caching</h2>
+    <div class="body conbody">
+      <p class="p">
+        One scalability aspect that affects heavily loaded clusters is the load on the HDFS
+        NameNode, from looking up the details as each HDFS file is opened. Impala queries
+        often access many different HDFS files, for example if a query does a full table scan
+        on a table with thousands of partitions, each partition containing multiple data files.
+        Accessing each column of a Parquet file also involves a separate <span class="q">"open"</span> call,
+        further increasing the load on the NameNode. High NameNode overhead can add startup time
+        (that is, increase latency) to Impala queries, and reduce overall throughput for non-Impala
+        workloads that also require accessing HDFS files.
+      </p>
+      <p class="p">
+        In <span class="keyword">Impala 2.10</span> and higher, you can reduce NameNode overhead by enabling
+        a caching feature for HDFS file handles. Data files that are accessed by different queries,
+        or even multiple times within the same query, can be accessed without a new <span class="q">"open"</span>
+        call and without fetching the file details again from the NameNode.
+      </p>
+      <p class="p">
+        Because this feature only involves HDFS data files, it does not apply to non-HDFS tables,
+        such as Kudu or HBase tables, or tables that store their data on cloud services such as
+        S3 or ADLS. Any read operations that perform remote reads also skip the cached file handles.
+      </p>
+      <p class="p">
+        This feature is turned off by default. To enable it, set the configuration option
+        <code class="ph codeph">max_cached_file_handles</code> to a non-zero value for each <span class="keyword cmdname">impalad</span>
+        daemon. Consider an initial starting value of 20 thousand, and adjust upward if NameNode
+        overhead is still significant, or downward if it is more important to reduce the extra memory usage
+        on each host. Each cache entry consumes 6 KB, meaning that caching 20,000 file handles requires
+        up to 120 MB on each DataNode. The exact memory usage varies depending on how many file handles
+        have actually been cached; memory is freed as file handles are evicted from the cache.
+      </p>
+      <p class="p">
+        If a manual HDFS operation moves a file to the HDFS Trashcan while the file handle is cached,
+        Impala still accesses the contents of that file. This is a change from prior behavior. Previously,
+        accessing a file that was in the trashcan would cause an error. This behavior only applies to
+        non-Impala methods of removing HDFS files, not the Impala mechanisms such as <code class="ph codeph">TRUNCATE TABLE</code>
+        or <code class="ph codeph">DROP TABLE</code>.
+      </p>
+      <p class="p">
+        If files are removed, replaced, or appended by HDFS operations outside of Impala, the way to bring the
+        file information up to date is to run the <code class="ph codeph">REFRESH</code> statement on the table.
+      </p>
+      <p class="p">
+        File handle cache entries are evicted as the cache fills up, or based on a timeout period
+        when they have not been accessed for some time.
+      </p>
+      <p class="p">
+        To evaluate the effectiveness of file handle caching for a particular workload, issue the
+        <code class="ph codeph">PROFILE</code> statement in <span class="keyword cmdname">impala-shell</span> or examine query
+        profiles in the Impala web UI. Look for the ratio of <code class="ph codeph">CachedFileHandlesHitCount</code>
+        (ideally, should be high) to <code class="ph codeph">CachedFileHandlesMissCount</code> (ideally, should be low).
+        Before starting any evaluation, run some representative queries to <span class="q">"warm up"</span> the cache,
+        because the first time each data file is accessed is always recorded as a cache miss.
+        To see metrics about file handle caching for each <span class="keyword cmdname">impalad</span> instance,
+        examine the <span class="ph uicontrol">/metrics</span> page in the Impala web UI, in particular the fields
+        <span class="ph uicontrol">impala-server.io.mgr.cached-file-handles-miss-count</span>,
+        <span class="ph uicontrol">impala-server.io.mgr.cached-file-handles-hit-count</span>, and
+        <span class="ph uicontrol">impala-server.io.mgr.num-cached-file-handles</span>.
+      </p>
+    </div>
+  </article>
+
 </article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ca9005be/docs/build/html/topics/impala_select.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_select.html b/docs/build/html/topics/impala_select.html
index 7a12c42..fee5e26 100644
--- a/docs/build/html/topics/impala_select.html
+++ b/docs/build/html/topics/impala_select.html
@@ -1,6 +1,6 @@
 <!DOCTYPE html
   SYSTEM "about:legacy-compat">
-<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_joins.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_order_by.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_group_by.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_having.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_offset.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_union.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_subqueries.html"><meta name="DC.Relation" scheme="U
 RI" content="../topics/impala_with.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_distinct.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_hints.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="select"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SELECT Statement</title></head><body id="select"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_joins.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_order_by.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_group_by.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_having.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_offset.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_union.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_subqueries.html"><meta name="DC.Relation" scheme="U
 RI" content="../topics/impala_tablesample.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_with.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_distinct.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_hints.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="select"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SELECT Statement</title></head><body id="select"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
 
   <h1 class="title topictitle1" id="ariaid-title1">SELECT Statement</h1>
   
@@ -33,11 +33,14 @@ FROM <em class="ph i">table_reference</em> [, <em class="ph i">table_reference</
   JOIN <em class="ph i">table_reference</em>
   [ON <em class="ph i">join_equality_clauses</em> | USING (<var class="keyword varname">col1</var>[, <var class="keyword varname">col2</var> ...]] ...
 WHERE <em class="ph i">conditions</em>
-GROUP BY { <em class="ph i">column</em> | <em class="ph i">expression</em> [ASC | DESC] [NULLS FIRST | NULLS LAST] [, ...] }
+GROUP BY { <em class="ph i">column</em> | <em class="ph i">expression</em> [, ...] }
 HAVING <code class="ph codeph">conditions</code>
-GROUP BY { <em class="ph i">column</em> | <em class="ph i">expression</em> [ASC | DESC] [, ...] }
+ORDER BY { <em class="ph i">column</em> | <em class="ph i">expression</em> [ASC | DESC] [NULLS FIRST | NULLS LAST] [, ...] }
 LIMIT <em class="ph i">expression</em> [OFFSET <em class="ph i">expression</em>]
 [UNION [ALL] <em class="ph i">select_statement</em>] ...]
+
+table_reference := { <var class="keyword varname">table_name</var> | (<var class="keyword varname">subquery</var>) }
+  <span class="ph">[ TABLESAMPLE SYSTEM(<var class="keyword varname">percentage</var>) [REPEATABLE(<var class="keyword varname">seed</var>)] ]</span>
 </code></pre>
 
     <p class="p">
@@ -107,7 +110,7 @@ LIMIT <em class="ph i">expression</em> [OFFSET <em class="ph i">expression</em>]
         are supported in all Impala versions. The <code class="ph codeph">CROSS JOIN</code> operator is available in Impala 1.2.2
         and higher. During performance tuning, you can override the reordering of join clauses that Impala does
         internally by including the keyword <code class="ph codeph">STRAIGHT_JOIN</code> immediately after the
-        <code class="ph codeph">SELECT</code> keyword
+        <code class="ph codeph">SELECT</code> and any <code class="ph codeph">DISTINCT</code> or <code class="ph codeph">ALL</code> keywords.
       </p>
         <p class="p">
           See <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a> for details and examples of join queries.
@@ -145,6 +148,12 @@ LIMIT <em class="ph i">expression</em> [OFFSET <em class="ph i">expression</em>]
         <code class="ph codeph">LIKE</code>, <code class="ph codeph">IN</code>, <code class="ph codeph">BETWEEN</code>, and <code class="ph codeph">COALESCE</code>. Impala
         specifically supports built-ins described in <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a>.
       </li>
+
+      <li class="li">
+        In <span class="keyword">Impala 2.9</span> and higher, an optional <code class="ph codeph">TABLESAMPLE</code>
+        clause immediately after a table reference, to specify that the query only processes a
+        specified percentage of the table data. See <a class="xref" href="impala_tablesample.html">TABLESAMPLE Clause</a> for details.
+      </li>
     </ul>
 
     <p class="p">
@@ -224,4 +233,4 @@ LIMIT <em class="ph i">expression</em> [OFFSET <em class="ph i">expression</em>]
   </div>
 
   
-<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_joins.html">Joins in Impala SELECT Statements</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_order_by.html">ORDER BY Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_group_by.html">GROUP BY Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_having.html">HAVING Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_limit.html">LIMIT Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_offset.html">OFFSET Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_union.html">UNION Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_subqueries.html">Subqueries in Impala SELECT Statements</a></strong><
 br></li><li class="link ulchildlink"><strong><a href="../topics/impala_with.html">WITH Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_distinct.html">DISTINCT Operator</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_hints.html">Query Hints in Impala SELECT Statements</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_joins.html">Joins in Impala SELECT Statements</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_order_by.html">ORDER BY Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_group_by.html">GROUP BY Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_having.html">HAVING Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_limit.html">LIMIT Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_offset.html">OFFSET Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_union.html">UNION Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_subqueries.html">Subqueries in Impala SELECT Statements</a></strong><
 br></li><li class="link ulchildlink"><strong><a href="../topics/impala_tablesample.html">TABLESAMPLE Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_with.html">WITH Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_distinct.html">DISTINCT Operator</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_hints.html">Query Hints in Impala SELECT Statements</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ca9005be/docs/build/html/topics/impala_shell_commands.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_shell_commands.html b/docs/build/html/topics/impala_shell_commands.html
index d2bee6c..43159a8 100644
--- a/docs/build/html/topics/impala_shell_commands.html
+++ b/docs/build/html/topics/impala_shell_commands.html
@@ -218,6 +218,30 @@
               </p>
             </td>
           </tr>
+          <tr class="row" id="shell_commands__rerun_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">rerun</code> or <code class="ph codeph">@</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Executes a previous <span class="keyword cmdname">impala-shell</span> command again,
+                from the list of commands displayed by the <code class="ph codeph">history</code>
+                command. These could be SQL statements, or commands specific to
+                <span class="keyword cmdname">impala-shell</span> such as <code class="ph codeph">quit</code>
+                or <code class="ph codeph">profile</code>.
+              </p>
+              <p class="p">
+                Specify an integer argument. A positive integer <code class="ph codeph">N</code>
+                represents the command labelled <code class="ph codeph">N</code> in the history list.
+                A negative integer <code class="ph codeph">-N</code> represents the <code class="ph codeph">N</code>th
+                command from the end of the list, such as -1 for the most recent command.
+                Commands that are executed again do not produce new entries in the
+                history list.
+              </p>
+            </td>
+          </tr>
           <tr class="row" id="shell_commands__select_cmd">
             <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
               <p class="p">

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ca9005be/docs/build/html/topics/impala_shell_running_commands.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_shell_running_commands.html b/docs/build/html/topics/impala_shell_running_commands.html
index e0e8880..15abef7 100644
--- a/docs/build/html/topics/impala_shell_running_commands.html
+++ b/docs/build/html/topics/impala_shell_running_commands.html
@@ -1,6 +1,6 @@
 <!DOCTYPE html
   SYSTEM "about:legacy-compat">
-<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_impala_shell.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="shell_running_commands"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Running Commands and SQL Statements in impala-shell</title></head><body id="shell_running_commands"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_impala_shell.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="shell_running_commands"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Running Commands and SQL Statements in impala-shell</title></head><body id="shell_running_commands"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
 
   <h1 class="title topictitle1" id="ariaid-title1">Running Commands and SQL Statements in impala-shell</h1>
   
@@ -254,4 +254,69 @@ Fetched 5 row(s) in 0.01s
 </code></pre>
 
   </div>
-<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_impala_shell.html">Using the Impala Shell (impala-shell Command)</a></div></div></nav></article></main></body></html>
\ No newline at end of file
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_impala_shell.html">Using the Impala Shell (impala-shell Command)</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="shell_running_commands__rerun">
+    <h2 class="title topictitle2" id="ariaid-title2">Rerunning impala-shell Commands</h2>
+    <div class="body conbody">
+
+      <p class="p">
+        In <span class="keyword">Impala 2.10</span> and higher, you can use the
+        <code class="ph codeph">rerun</code> command, or its abbreviation <code class="ph codeph">@</code>,
+        to re-execute commands from the history list. The argument can be
+        a positive integer (reflecting the number shown in <code class="ph codeph">history</code>
+        output) or a negative integer (reflecting the N'th last command in the
+        <code class="ph codeph">history</code> output. For example:
+      </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] &gt; select * from p1 order by t limit 5;
+...
+[localhost:21000] &gt; show table stats p1;
++-----------+--------+--------+------------------------------------------------------------+
+| #Rows     | #Files | Size   | Location                                                   |
++-----------+--------+--------+------------------------------------------------------------+
+| 134217728 | 50     | 4.66MB | hdfs://test.example.com:8020/user/hive/warehouse/jdr.db/p1 |
++-----------+--------+--------+------------------------------------------------------------+
+[localhost:21000] &gt; compute stats p1;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 3 column(s). |
++-----------------------------------------+
+[localhost:21000] &gt; history;
+[1]: use jdr;
+[2]: history;
+[3]: show tables;
+[4]: select * from p1 order by t limit 5;
+[5]: show table stats p1;
+[6]: compute stats p1;
+[7]: history;
+[localhost:21000] &gt; @-2; &lt;- Rerun the 2nd last command in the history list
+Rerunning compute stats p1;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 3 column(s). |
++-----------------------------------------+
+[localhost:21000] &gt; history; &lt;- History list is not updated by rerunning commands
+                                or by repeating the last command, in this case 'history'.
+[1]: use jdr;
+[2]: history;
+[3]: show tables;
+[4]: select * from p1 order by t limit 5;
+[5]: show table stats p1;
+[6]: compute stats p1;
+[7]: history;
+[localhost:21000] &gt; @4; &lt;- Rerun command #4 in the history list using short form '@'.
+Rerunning select * from p1 order by t limit 5;
+...
+[localhost:21000] &gt; rerun 4; &lt;- Rerun command #4 using long form 'rerun'.
+Rerunning select * from p1 order by t limit 5;
+...
+
+</code></pre>
+
+    </div>
+  </article>
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ca9005be/docs/build/html/topics/impala_ssl.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_ssl.html b/docs/build/html/topics/impala_ssl.html
index a9b4d25..e995159 100644
--- a/docs/build/html/topics/impala_ssl.html
+++ b/docs/build/html/topics/impala_ssl.html
@@ -116,4 +116,75 @@
     </div>
   </article>
 
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="ssl__tls_min_version">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Specifying TLS/SSL Minimum Allowed Version and Ciphers</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Depending on your cluster configuration and the security practices in your
+        organization, you might need to restrict the allowed versions of TLS/SSL
+        used by Impala. Older TLS/SSL versions might have vulnerabilities or lack
+        certain features. In <span class="keyword">Impala 2.10</span>, you can use startup
+        options for the <span class="keyword cmdname">impalad</span>, <span class="keyword cmdname">catalogd</span>,
+        and <span class="keyword cmdname">statestored</span> daemons to specify a minimum allowed
+        version of TLS/SSL.
+      </p>
+
+      <p class="p">
+        Specify one of the following values for the <code class="ph codeph">--ssl_minimum_version</code>
+        configuration setting:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">tlsv1</code>: Allow any TLS version of 1.0 or higher.
+            This setting is the default when TLS/SSL is enabled.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">tlsv1.1</code>: Allow any TLS version of 1.1 or higher.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">tlsv1.2</code>: Allow any TLS version of 1.2 or higher.
+          </p>
+        </li>
+      </ul>
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+          As of <span class="keyword">Impala 2.10</span>, TLSv1.2 may not work for Impala on RHEL 6
+          or CentOS 6, even if OpenSSL 1.0.1 is available. The daemons fail to start, with a
+          socket error stating the TLS version is not supported. The underlying cause is related to
+          <a class="xref" href="https://bugzilla.redhat.com/show_bug.cgi?id=1497859" target="_blank">Red Hat issue 1497859</a>.
+          The issue applies if you build on a RHEL 6 or CentOS 6 system with OpenSSL 1.0.0, and
+          run on a RHEL 6 or CentOS 6 system with OpenSSL 1.0.1.
+        </p>
+      </div>
+
+      <p class="p">
+        Along with specifying the version, you can also specify the allowed set of TLS ciphers
+        by using the <code class="ph codeph">--ssl_cipher_list</code> configuration setting. The argument to
+        this option is a list of keywords, separated by colons, commas, or spaces, and
+        optionally including other notation. For example:
+      </p>
+
+<pre class="pre codeblock"><code>
+--ssl_cipher_list="RC4-SHA,RC4-MD5"
+</code></pre>
+
+      <p class="p">
+        By default, the cipher list is empty, and Impala uses the default cipher list for
+        the underlying platform. See the output of <span class="keyword cmdname">man ciphers</span> for the full
+        set of keywords and notation allowed in the argument string.
+      </p>
+
+    </div>
+
+  </article>
+
 </article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ca9005be/docs/build/html/topics/impala_string_functions.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_string_functions.html b/docs/build/html/topics/impala_string_functions.html
index b3e6dbd..a7a0559 100644
--- a/docs/build/html/topics/impala_string_functions.html
+++ b/docs/build/html/topics/impala_string_functions.html
@@ -63,6 +63,276 @@
 
       
 
+        <dt class="dt dlterm" id="string_functions__base64decode">
+          <code class="ph codeph">base64decode(string str)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            For general information about Base64 encoding, see
+            <a class="xref" href="https://en.wikipedia.org/wiki/Base64" target="_blank">Base64 article on Wikipedia</a>.
+          </p>
+          <p class="p">
+        The functions <code class="ph codeph">base64encode()</code> and
+        <code class="ph codeph">base64decode()</code> are typically used
+        in combination, to store in an Impala table string data that is
+        problematic to store or transmit. For example, you could use
+        these functions to store string data that uses an encoding
+        other than UTF-8, or to transform the values in contexts that
+        require ASCII values, such as for partition key columns.
+        Keep in mind that base64-encoded values produce different results
+        for string functions such as <code class="ph codeph">LENGTH()</code>,
+        <code class="ph codeph">MAX()</code>, and <code class="ph codeph">MIN()</code> than when
+        those functions are called with the unencoded string values.
+      </p>
+          <p class="p">
+        The set of characters that can be generated as output
+        from <code class="ph codeph">base64encode()</code>, or specified in
+        the argument string to <code class="ph codeph">base64decode()</code>,
+        are the ASCII uppercase and lowercase letters (A-Z, a-z),
+        digits (0-9), and the punctuation characters
+        <code class="ph codeph">+</code>, <code class="ph codeph">/</code>, and <code class="ph codeph">=</code>.
+      </p>
+          <p class="p">
+        All return values produced by <code class="ph codeph">base64encode()</code>
+        are a multiple of 4 bytes in length. All argument values
+        supplied to <code class="ph codeph">base64decode()</code> must also be a
+        multiple of 4 bytes in length. If a base64-encoded value
+        would otherwise have a different length, it can be padded
+        with trailing <code class="ph codeph">=</code> characters to reach a length
+        that is a multiple of 4 bytes.
+      </p>
+          <p class="p">
+        If the argument string to <code class="ph codeph">base64decode()</code> does
+        not represent a valid base64-encoded value, subject to the
+        constraints of the Impala implementation such as the allowed
+        character set, the function returns <code class="ph codeph">NULL</code>.
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <div class="p">
+        The following examples show how to use <code class="ph codeph">base64encode()</code>
+        and <code class="ph codeph">base64decode()</code> together to store and retrieve
+        string values:
+<pre class="pre codeblock"><code>
+-- An arbitrary string can be encoded in base 64.
+-- The length of the output is a multiple of 4 bytes,
+-- padded with trailing = characters if necessary.
+select base64encode('hello world') as encoded,
+  length(base64encode('hello world')) as length;
++------------------+--------+
+| encoded          | length |
++------------------+--------+
+| aGVsbG8gd29ybGQ= | 16     |
++------------------+--------+
+
+-- Passing an encoded value to base64decode() produces
+-- the original value.
+select base64decode('aGVsbG8gd29ybGQ=') as decoded;
++-------------+
+| decoded     |
++-------------+
+| hello world |
++-------------+
+</code></pre>
+
+      These examples demonstrate incorrect encoded values that
+      produce <code class="ph codeph">NULL</code> return values when decoded:
+
+<pre class="pre codeblock"><code>
+-- The input value to base64decode() must be a multiple of 4 bytes.
+-- In this case, leaving off the trailing = padding character
+-- produces a NULL return value.
+select base64decode('aGVsbG8gd29ybGQ') as decoded;
++---------+
+| decoded |
++---------+
+| NULL    |
++---------+
+WARNINGS: UDF WARNING: Invalid base64 string; input length is 15,
+  which is not a multiple of 4.
+
+-- The input to base64decode() can only contain certain characters.
+-- The $ character in this case causes a NULL return value.
+select base64decode('abc$');
++----------------------+
+| base64decode('abc$') |
++----------------------+
+| NULL                 |
++----------------------+
+WARNINGS: UDF WARNING: Could not base64 decode input in space 4; actual output length 0
+</code></pre>
+
+      These examples demonstrate <span class="q">"round-tripping"</span> of an original string to an
+      encoded string, and back again. This technique is applicable if the original
+      source is in an unknown encoding, or if some intermediate processing stage
+      might cause national characters to be misrepresented:
+
+<pre class="pre codeblock"><code>
+select 'circumflex accents: â, ê, î, ô, û' as original,
+  base64encode('circumflex accents: â, ê, î, ô, û') as encoded;
++-----------------------------------+------------------------------------------------------+
+| original                          | encoded                                              |
++-----------------------------------+------------------------------------------------------+
+| circumflex accents: â, ê, î, ô, û | Y2lyY3VtZmxleCBhY2NlbnRzOiDDoiwgw6osIMOuLCDDtCwgw7s= |
++-----------------------------------+------------------------------------------------------+
+
+select base64encode('circumflex accents: â, ê, î, ô, û') as encoded,
+  base64decode(base64encode('circumflex accents: â, ê, î, ô, û')) as decoded;
++------------------------------------------------------+-----------------------------------+
+| encoded                                              | decoded                           |
++------------------------------------------------------+-----------------------------------+
+| Y2lyY3VtZmxleCBhY2NlbnRzOiDDoiwgw6osIMOuLCDDtCwgw7s= | circumflex accents: â, ê, î, ô, û |
++------------------------------------------------------+-----------------------------------+
+</code></pre>
+      </div>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__base64encode">
+          <code class="ph codeph">base64encode(string str)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            For general information about Base64 encoding, see
+            <a class="xref" href="https://en.wikipedia.org/wiki/Base64" target="_blank">Base64 article on Wikipedia</a>.
+          </p>
+          <p class="p">
+        The functions <code class="ph codeph">base64encode()</code> and
+        <code class="ph codeph">base64decode()</code> are typically used
+        in combination, to store in an Impala table string data that is
+        problematic to store or transmit. For example, you could use
+        these functions to store string data that uses an encoding
+        other than UTF-8, or to transform the values in contexts that
+        require ASCII values, such as for partition key columns.
+        Keep in mind that base64-encoded values produce different results
+        for string functions such as <code class="ph codeph">LENGTH()</code>,
+        <code class="ph codeph">MAX()</code>, and <code class="ph codeph">MIN()</code> than when
+        those functions are called with the unencoded string values.
+      </p>
+          <p class="p">
+        The set of characters that can be generated as output
+        from <code class="ph codeph">base64encode()</code>, or specified in
+        the argument string to <code class="ph codeph">base64decode()</code>,
+        are the ASCII uppercase and lowercase letters (A-Z, a-z),
+        digits (0-9), and the punctuation characters
+        <code class="ph codeph">+</code>, <code class="ph codeph">/</code>, and <code class="ph codeph">=</code>.
+      </p>
+          <p class="p">
+        All return values produced by <code class="ph codeph">base64encode()</code>
+        are a multiple of 4 bytes in length. All argument values
+        supplied to <code class="ph codeph">base64decode()</code> must also be a
+        multiple of 4 bytes in length. If a base64-encoded value
+        would otherwise have a different length, it can be padded
+        with trailing <code class="ph codeph">=</code> characters to reach a length
+        that is a multiple of 4 bytes.
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <div class="p">
+        The following examples show how to use <code class="ph codeph">base64encode()</code>
+        and <code class="ph codeph">base64decode()</code> together to store and retrieve
+        string values:
+<pre class="pre codeblock"><code>
+-- An arbitrary string can be encoded in base 64.
+-- The length of the output is a multiple of 4 bytes,
+-- padded with trailing = characters if necessary.
+select base64encode('hello world') as encoded,
+  length(base64encode('hello world')) as length;
++------------------+--------+
+| encoded          | length |
++------------------+--------+
+| aGVsbG8gd29ybGQ= | 16     |
++------------------+--------+
+
+-- Passing an encoded value to base64decode() produces
+-- the original value.
+select base64decode('aGVsbG8gd29ybGQ=') as decoded;
++-------------+
+| decoded     |
++-------------+
+| hello world |
++-------------+
+</code></pre>
+
+      These examples demonstrate incorrect encoded values that
+      produce <code class="ph codeph">NULL</code> return values when decoded:
+
+<pre class="pre codeblock"><code>
+-- The input value to base64decode() must be a multiple of 4 bytes.
+-- In this case, leaving off the trailing = padding character
+-- produces a NULL return value.
+select base64decode('aGVsbG8gd29ybGQ') as decoded;
++---------+
+| decoded |
++---------+
+| NULL    |
++---------+
+WARNINGS: UDF WARNING: Invalid base64 string; input length is 15,
+  which is not a multiple of 4.
+
+-- The input to base64decode() can only contain certain characters.
+-- The $ character in this case causes a NULL return value.
+select base64decode('abc$');
++----------------------+
+| base64decode('abc$') |
++----------------------+
+| NULL                 |
++----------------------+
+WARNINGS: UDF WARNING: Could not base64 decode input in space 4; actual output length 0
+</code></pre>
+
+      These examples demonstrate <span class="q">"round-tripping"</span> of an original string to an
+      encoded string, and back again. This technique is applicable if the original
+      source is in an unknown encoding, or if some intermediate processing stage
+      might cause national characters to be misrepresented:
+
+<pre class="pre codeblock"><code>
+select 'circumflex accents: â, ê, î, ô, û' as original,
+  base64encode('circumflex accents: â, ê, î, ô, û') as encoded;
++-----------------------------------+------------------------------------------------------+
+| original                          | encoded                                              |
++-----------------------------------+------------------------------------------------------+
+| circumflex accents: â, ê, î, ô, û | Y2lyY3VtZmxleCBhY2NlbnRzOiDDoiwgw6osIMOuLCDDtCwgw7s= |
++-----------------------------------+------------------------------------------------------+
+
+select base64encode('circumflex accents: â, ê, î, ô, û') as encoded,
+  base64decode(base64encode('circumflex accents: â, ê, î, ô, û')) as decoded;
++------------------------------------------------------+-----------------------------------+
+| encoded                                              | decoded                           |
++------------------------------------------------------+-----------------------------------+
+| Y2lyY3VtZmxleCBhY2NlbnRzOiDDoiwgw6osIMOuLCDDtCwgw7s= | circumflex accents: â, ê, î, ô, û |
++------------------------------------------------------+-----------------------------------+
+</code></pre>
+      </div>
+        </dd>
+
+      
+
+      
+
         <dt class="dt dlterm" id="string_functions__btrim">
           <code class="ph codeph">btrim(string a)</code>,
           <code class="ph codeph">btrim(string a, string chars_to_trim)</code>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ca9005be/docs/build/html/topics/impala_struct.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_struct.html b/docs/build/html/topics/impala_struct.html
index c7b02d0..e7d4c04 100644
--- a/docs/build/html/topics/impala_struct.html
+++ b/docs/build/html/topics/impala_struct.html
@@ -130,7 +130,7 @@ type ::= <var class="keyword varname">primitive_type</var> | <var class="keyword
           </p>
         </li>
         <li class="li">
-          <p class="p" id="struct__d6e3003">
+          <p class="p" id="struct__d6e3156">
             The maximum length of the column definition for any complex type, including declarations for any nested types,
             is 4000 characters.
           </p>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ca9005be/docs/build/html/topics/impala_subqueries.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_subqueries.html b/docs/build/html/topics/impala_subqueries.html
index 2be2880..280d60c 100644
--- a/docs/build/html/topics/impala_subqueries.html
+++ b/docs/build/html/topics/impala_subqueries.html
@@ -171,6 +171,13 @@ SELECT x FROM t1 WHERE y &gt; (SELECT count(z) FROM t2);
 </code></pre>
 
     <p class="p">
+        The <code class="ph codeph">STRAIGHT_JOIN</code> hint affects the join order of table references in the query
+        block containing the hint. It does not affect the join order of nested queries, such as views,
+        inline views, or <code class="ph codeph">WHERE</code>-clause subqueries. To use this hint for performance
+        tuning of complex queries, apply the hint to all query blocks that need a fixed join order.
+      </p>
+
+    <p class="p">
         <strong class="ph b">Internal details:</strong>
       </p>
 
@@ -287,6 +294,15 @@ SELECT x FROM t1 WHERE y &gt; (SELECT count(z) FROM t2);
           when referring to any column from the outer query block within a subquery.
         </p>
       </li>
+      <li class="li">
+        <p class="p">
+        The <code class="ph codeph">TABLESAMPLE</code> clause of the <code class="ph codeph">SELECT</code>
+        statement does not apply to a table reference derived from a view, a subquery,
+        or anything other than a real base table. This clause only works for tables
+        backed by HDFS or HDFS-like data files, therefore it does not apply to Kudu or
+        HBase tables.
+      </p>
+      </li>
     </ul>
 
     <p class="p">

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ca9005be/docs/build/html/topics/impala_tablesample.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_tablesample.html b/docs/build/html/topics/impala_tablesample.html
new file mode 100644
index 0000000..7a4f767
--- /dev/null
+++ b/docs/build/html/topics/impala_tablesample.html
@@ -0,0 +1,554 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="tablesample"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>TABLESAMPLE Clause</title></head><body id="tablesample"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">TABLESAMPLE Clause</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Specify the <code class="ph codeph">TABLESAMPLE</code> clause in cases where you need
+      to explore the data distribution within the table, the table is very large,
+      and it is impractical or unnecessary to process all the data from the table
+      or selected partitions.
+    </p>
+
+    <p class="p">
+      The clause makes the query process a randomized set of data files from the
+      table, so that the total volume of data is greater than or equal to the specified
+      percentage of data bytes within that table. (Or the data bytes within the set of
+      partitions that remain after partition pruning is performed.)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>
+  <span class="ph">TABLESAMPLE SYSTEM(<var class="keyword varname">percentage</var>) [REPEATABLE(<var class="keyword varname">seed</var>)]</span>
+</code></pre>
+
+    <p class="p">
+      The <code class="ph codeph">TABLESAMPLE</code> clause comes immediately after a table name or table alias.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">SYSTEM</code> keyword represents the sampling method. Currently,
+      Impala only supports a single sampling method named <code class="ph codeph">SYSTEM</code>.
+    </p>
+
+    <p class="p">
+      The <var class="keyword varname">percentage</var> argument is an integer literal from 0 to 100.
+      A percentage of 0 produces an empty result set for a particular table reference,
+      while a percentage of 100 uses the entire contents. Because the sampling works by
+      selecting a random set of data files, the proportion of sampled data from the
+      table may be greater than the specified percentage, based on the number and sizes
+      of the underlying data files. See the usage notes for details.
+    </p>
+
+    <p class="p">
+      The optional <code class="ph codeph">REPEATABLE</code> keyword lets you specify an arbitrary
+      positive integer seed value that ensures that when the query is run again, the
+      sampling selects the same set of data files each time. <code class="ph codeph">REPEATABLE</code>
+      does not have a default value. If you omit the <code class="ph codeph">REPEATABLE</code> keyword,
+      the random seed is derived from the current time.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.9.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      You might use this clause with aggregation queries, such as finding
+      the approximate average, minimum, or maximum where exact precision
+      is not required. You can use these findings to plan the most effective
+      strategy for constructing queries against the full table or designing
+      a partitioning strategy for the data.
+    </p>
+
+    <p class="p">
+      Some other database systems have a <code class="ph codeph">TABLESAMPLE</code> clause.
+      The Impala syntax for this clause is modeled on the syntax for popular
+      relational databases, not the Hive <code class="ph codeph">TABLESAMPLE</code> clause.
+      For example, there is no <code class="ph codeph">BUCKETS</code> keyword as in HiveQL.
+    </p>
+
+    <p class="p">
+      The precision of the <var class="keyword varname">percentage</var> threshold depends on
+      the number and sizes of the underlying data files. Impala brings in
+      additional data files, one at a time, until the number of bytes exceeds
+      the specified percentage based on the total number of bytes for the
+      entire set of table data. The precision of the percentage threshold is higher
+      when the table contains many data files with consistent sizes. See the
+      code listings later in this section for examples.
+    </p>
+
+    <p class="p">
+      When you estimate characteristics of the data distribution based on sampling
+      a percentage of the table data, be aware that the data might be unevenly distributed
+      between different files. Do not assume that the percentage figure reflects the
+      percentage of rows in the table. For example, one file might contain all blank values
+      for a <code class="ph codeph">STRING</code> column, while another file contains long strings
+      in that column; therefore, one file could contain many more rows than another.
+      Likewise, a table created with the <code class="ph codeph">SORT BY</code> clause might
+      contain narrow ranges of values for the sort columns, making it impractical to
+      extrapolate the number of distinct values for those columns based on sampling
+      only some of the data files.
+    </p>
+
+    <p class="p">
+      Because a sample of the table data might not contain all values for a particular
+      column, if the <code class="ph codeph">TABLESAMPLE</code> is used in a join query, the
+      key relationships between the tables might produce incomplete result sets
+      compared to joins using all the table data. For example, if you join 50%
+      of table A with 50% of table B, some values in the join columns might
+      not match between the two tables, even though overall there is a 1:1
+      relationship between the tables.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">REPEATABLE</code> keyword makes identical queries use a
+      consistent set of data files when the query is repeated. You specify an
+      arbitrary integer key that acts as a seed value when Impala randomly
+      selects the set of data files to use in the query. This technique
+      lets you verify correctness, examine performance, and so on for queries
+      using the <code class="ph codeph">TABLESAMPLE</code> clause without the sampled data
+      being different each time. The repeatable aspect is reset (that is, the
+      set of selected data files may change) any time the contents of the table
+      change. The statements or operations that can make sampling results
+      non-repeatable are:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <code class="ph codeph">INSERT</code>.
+      </li>
+      <li class="li">
+        <code class="ph codeph">TRUNCATE TABLE</code>.
+      </li>
+      <li class="li">
+        <code class="ph codeph">LOAD DATA</code>.
+      </li>
+      <li class="li">
+        <code class="ph codeph">REFRESH</code> or <code class="ph codeph">INVALIDATE METADATA</code>
+        after files are added or removed by a non-Impala mechanism.
+      </li>
+      <li class="li">
+      </li>
+    </ul>
+
+    <p class="p">
+      This clause is similar in some ways to the <code class="ph codeph">LIMIT</code> clause,
+      because both serve to limit the size of the intermediate data and final
+      result set. <code class="ph codeph">LIMIT 0</code> is more efficient than
+      <code class="ph codeph">TABLESAMPLE SYSTEM(0)</code> for verifying that a query can execute
+      without producing any results. <code class="ph codeph">TABLESAMPLE SYSTEM(<var class="keyword varname">n</var>)</code>
+      often makes query processing more efficient than using a <code class="ph codeph">LIMIT</code> clause
+      by itself, because all phases of query execution use less data overall.
+      If the intent is to retrieve some representative values from the table
+      in an efficient way, you might combine <code class="ph codeph">TABLESAMPLE</code>,
+      <code class="ph codeph">ORDER BY</code>, and <code class="ph codeph">LIMIT</code> clauses within a single query.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Partitioning:</strong>
+      </p>
+    <p class="p">
+      When you query a partitioned table, any partition pruning happens
+      before Impala selects the data files to sample. For example, in a
+      table partitioned by year, a query with <code class="ph codeph">WHERE year = 2017</code>
+      and a <code class="ph codeph">TABLESAMPLE SYSTEM(10)</code> clause would sample
+      data files representing at least 10% of the bytes present in the
+      2017 partition.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+    <p class="p">
+      This clause applies to S3 tables the same way as tables
+      with data files stored on HDFS.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">ADLS considerations:</strong>
+      </p>
+    <p class="p">
+      This clause applies to ADLS tables the same way as tables
+      with data files stored on HDFS.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+      This clause does not apply to Kudu tables.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong>
+      </p>
+    <p class="p">
+      This clause does not apply to HBase tables.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Performance considerations:</strong>
+      </p>
+    <p class="p">
+      From a performance perspective, the <code class="ph codeph">TABLESAMPLE</code>
+      clause is especially valuable for exploratory queries on
+      text, Avro, or other file formats other than Parquet. Text-based
+      or row-oriented file formats must process substantial amounts of
+      redundant data for queries that derive aggregate results such as
+      <code class="ph codeph">MAX()</code>, <code class="ph codeph">MIN()</code>, or <code class="ph codeph">AVG()</code>
+      for a single column. Therefore, you might use <code class="ph codeph">TABLESAMPLE</code>
+      early in the ETL pipeline, when data is still in raw text format
+      and has not been converted to Parquet or moved into a partitioned
+      table.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+      This clause applies only to tables that use a storage layer
+      with underlying raw data files, such as HDFS, Amazon S3,
+      or Microsoft ADLS.
+    </p>
+
+    <p class="p">
+      This clause does not apply to table references that represent views.
+      A query that applies the <code class="ph codeph">TABLESAMPLE</code> clause to a
+      view or a subquery fails with a semantic error.
+    </p>
+
+    <p class="p">
+      Because the sampling works at the level of entire data files, it
+      is by nature coarse-grained. It is possible to specify a small
+      sample percentage but still process a substantial portion of the
+      table data if the table contains relatively few data files, if
+      each data file is very large, or if the data files vary substantially
+      in size. Be sure that you understand the data distribution and physical
+      file layout so that you can verify if the results are suitable for
+      extrapolation. For example, if the table contains only a single data file,
+      the <span class="q">"sample"</span> will consist of all the table data regardless of
+      the percentage you specify. If the table contains data files of
+      1 GiB, 1 GiB, and 1 KiB, when you specify a sampling percentage of
+      50 you would either process slightly more than 50% of the table
+      (1 GiB + 1 KiB) or almost the entire table (1 GiB + 1 GiB),
+      depending on which data files were selected for sampling.
+    </p>
+
+    <p class="p">
+      If data files are added by a non-Impala mechanism, and the
+      table metadata is not updated by a <code class="ph codeph">REFRESH</code>
+      or <code class="ph codeph">INVALIDATE METADATA</code> statement, the
+      <code class="ph codeph">TABLESAMPLE</code> clause does not consider those
+      new files when computing the number of bytes in the table
+      or selecting which files to sample.
+    </p>
+
+    <p class="p">
+      If data files are removed by a non-Impala mechanism, and the
+      table metadata is not updated by a <code class="ph codeph">REFRESH</code>
+      or <code class="ph codeph">INVALIDATE METADATA</code> statement, the
+      query fails if the <code class="ph codeph">TABLESAMPLE</code> clause
+      attempts to reference any of the missing files.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following examples demonstrate the <code class="ph codeph">TABLESAMPLE</code> clause.
+      These examples intentionally use very small data sets to illustrate how
+      the number of files, size of each file, and overall size of data in the table
+      interact with the percentage specified in the clause.
+    </p>
+
+    <p class="p">
+      These examples use an unpartitioned table, containing several files of roughly
+      the same size:
+    </p>
+
+<pre class="pre codeblock"><code>
+create table sample_demo (x int, s string);
+
+insert into sample_demo values (1, 'one');
+insert into sample_demo values (2, 'two');
+insert into sample_demo values (3, 'three');
+insert into sample_demo values (4, 'four');
+insert into sample_demo values (5, 'five');
+
+show files in sample_demo;
++---------------------+------+-----------+
+| Path                | Size | Partition |
++---------------------+------+-----------+
+| 991213608_data.0.   | 7B   |           |
+| 982196806_data.0.   | 6B   |           |
+| _2122096884_data.0. | 8B   |           |
+| _586325431_data.0.  | 6B   |           |
+| 1894746258_data.0.  | 7B   |           |
++---------------------+------+-----------+
+
+show table stats sample_demo;
++-------+--------+------+--------+-------------------------+
+| #Rows | #Files | Size | Format | Location                |
++-------+--------+------+--------+-------------------------+
+| -1    | 5      | 34B  | TEXT   | /tsample.db/sample_demo |
++-------+--------+------+--------+-------------------------+
+&lt;/codeblock&gt;
+
+    &lt;p&gt;
+      A query that samples 50% of the table must process at least
+      17 bytes of data. Based on the sizes of the data files,
+      we can predict that each such query uses 3 arbitrary files.
+      Any 1 or 2 files are not enough to reach 50% of the total
+      data in the table (34 bytes), so the query adds more files
+      until it passes the 50% threshold:
+    &lt;/p&gt;
+
+&lt;codeblock&gt;&lt;![CDATA[
+select distinct x from sample_demo tablesample system(50);
++---+
+| x |
++---+
+| 4 |
+| 1 |
+| 5 |
++---+
+
+select distinct x from sample_demo tablesample system(50);
++---+
+| x |
++---+
+| 5 |
+| 4 |
+| 2 |
++---+
+
+select distinct x from sample_demo tablesample system(50);
++---+
+| x |
++---+
+| 5 |
+| 3 |
+| 2 |
++---+
+&lt;/codeblock&gt;
+
+    &lt;p&gt;
+      To help run reproducible experiments, the &lt;codeph&gt;REPEATABLE&lt;/codeph&gt;
+      clause causes Impala to choose the same set of files for each query.
+      Although the data set being considered is deterministic, the order
+      of results varies (in the absence of an &lt;codeph&gt;ORDER BY&lt;/codeph&gt;
+      clause) because of the way distributed queries are processed:
+    &lt;/p&gt;
+
+&lt;codeblock&gt;&lt;![CDATA[
+select distinct x from sample_demo
+  tablesample system(50) repeatable (12345);
++---+
+| x |
++---+
+| 3 |
+| 2 |
+| 1 |
++---+
+
+select distinct x from sample_demo
+  tablesample system(50) repeatable (12345);
++---+
+| x |
++---+
+| 2 |
+| 1 |
+| 3 |
++---+
+&lt;/codeblock&gt;
+
+    &lt;p&gt;
+      The following examples show how uneven data distribution affects
+      which data is sampled. Adding another data file containing a long
+      string value changes the threshold for 50% of the total data in
+      the table:
+    &lt;/p&gt;
+
+&lt;codeblock&gt;&lt;![CDATA[
+insert into sample_demo values (1000, 'Boyhood is the longest time in li
+fe for a boy. The last term of the school-year is made of decades, not o
+f weeks, and living through them is like waiting for the millennium. Boo
+th Tarkington');
+
+show files in sample_demo;
++---------------------+------+-----------+
+| Path                | Size | Partition |
++---------------------+------+-----------+
+| 991213608_data.0.   | 7B   |           |
+| 982196806_data.0.   | 6B   |           |
+| _253317650_data.0.  | 196B |           |
+| _2122096884_data.0. | 8B   |           |
+| _586325431_data.0.  | 6B   |           |
+| 1894746258_data.0.  | 7B   |           |
++---------------------+------+-----------+
+
+show table stats sample_demo;
++-------+--------+------+--------+-------------------------+
+| #Rows | #Files | Size | Format | Location                |
++-------+--------+------+--------+-------------------------+
+| -1    | 6      | 230B | TEXT   | /tsample.db/sample_demo |
++-------+--------+------+--------+-------------------------+
+&lt;/codeblock&gt;
+
+    &lt;p&gt;
+      Even though the queries do not refer to the &lt;codeph&gt;S&lt;/codeph&gt;
+      column containing the long value, all the sampling queries include
+      the data file containing the column value &lt;codeph&gt;X=1000&lt;/codeph&gt;,
+      because the query cannot reach the 50% threshold (115 bytes) without
+      including that file. The large file might be considered first, in which
+      case it is the only file processed by the query. Or an arbitrary
+      set of other files might be considered first.
+    &lt;/p&gt;
+
+&lt;codeblock&gt;&lt;![CDATA[
+select distinct x from sample_demo tablesample system(50);
++------+
+| x    |
++------+
+| 1000 |
+| 3    |
+| 1    |
++------+
+
+select distinct x from sample_demo tablesample system(50);
++------+
+| x    |
++------+
+| 1000 |
++------+
+
+select distinct x from sample_demo tablesample system(50);
++------+
+| x    |
++------+
+| 1000 |
+| 4    |
+| 2    |
+| 1    |
++------+
+&lt;/codeblock&gt;
+
+    &lt;p&gt;
+      The following examples demonstrate how the &lt;codeph&gt;TABLESAMPLE&lt;/codeph&gt;
+      clause interacts with other table aspects, such as partitioning and file
+      format:
+    &lt;/p&gt;
+
+&lt;codeblock&gt;&lt;![CDATA[
+create table sample_demo_partitions (x int, s string) partitioned by (n int) stored as parquet;
+
+insert into sample_demo_partitions partition (n = 1) select * from sample_demo;
+insert into sample_demo_partitions partition (n = 2) select * from sample_demo;
+insert into sample_demo_partitions partition (n = 3) select * from sample_demo;
+
+show files in sample_demo_partitions;
++--------------------------------+--------+-----------+
+| Path                           | Size   | Partition |
++--------------------------------+--------+-----------+
+| 000000_364262785_data.0.parq   | 1.24KB | n=1       |
+| 000001_973526736_data.0.parq   | 566B   | n=1       |
+| 0000000_1300598134_data.0.parq | 1.24KB | n=2       |
+| 0000001_689099063_data.0.parq  | 568B   | n=2       |
+| 0000000_1861371709_data.0.parq | 1.24KB | n=3       |
+| 0000001_1065507912_data.0.parq | 566B   | n=3       |
++--------------------------------+--------+-----------+
+
+show table stats tablesample_demo_partitioned;
++-------+-------+--------+--------+---------+----------------------------------------------+
+| n     | #Rows | #Files | Size   | Format  | Location                                     |
++-------+-------+--------+--------+---------+----------------------------------------------+
+| 1     | -1    | 2      | 1.79KB | PARQUET | /tsample.db/tablesample_demo_partitioned/n=1 |
+| 2     | -1    | 2      | 1.80KB | PARQUET | /tsample.db/tablesample_demo_partitioned/n=2 |
+| 3     | -1    | 2      | 1.79KB | PARQUET | /tsample.db/tablesample_demo_partitioned/n=3 |
+| Total | -1    | 6      | 5.39KB |         |                                              |
++-------+-------+--------+--------+---------+----------------------------------------------+
+&lt;/codeblock&gt;
+
+    &lt;p&gt;
+      If the query does not involve any partition pruning, the
+      sampling applies to the data volume of the entire table:
+    &lt;/p&gt;
+
+&lt;codeblock&gt;&lt;![CDATA[
+-- 18 rows total.
+select count(*) from sample_demo_partitions;
++----------+
+| count(*) |
++----------+
+| 18       |
++----------+
+
+-- The number of rows per data file is not
+-- perfectly balanced, therefore the count
+-- is different depending on which set of files
+-- is considered.
+select count(*) from sample_demo_partitions
+  tablesample system(75);
++----------+
+| count(*) |
++----------+
+| 14       |
++----------+
+
+select count(*) from sample_demo_partitions
+  tablesample system(75);
++----------+
+| count(*) |
++----------+
+| 16       |
++----------+
+&lt;/codeblock&gt;
+
+    &lt;p&gt;
+      If the query only processes certain partitions,
+      the query computes the sampling threshold based on
+      the data size and set of files only from the
+      relevant partitions:
+    &lt;/p&gt;
+
+&lt;codeblock&gt;&lt;![CDATA[
+select count(*) from sample_demo_partitions
+  tablesample system(50) where n = 1;
++----------+
+| count(*) |
++----------+
+| 6        |
++----------+
+
+select count(*) from sample_demo_partitions
+  tablesample system(50) where n = 1;
++----------+
+| count(*) |
++----------+
+| 2        |
++----------+
+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_select.html#select">SELECT Statement</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ca9005be/docs/build/html/topics/impala_upgrading.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_upgrading.html b/docs/build/html/topics/impala_upgrading.html
index 25e8397..fcd43ac 100644
--- a/docs/build/html/topics/impala_upgrading.html
+++ b/docs/build/html/topics/impala_upgrading.html
@@ -95,8 +95,8 @@ $ ps ax | grep [c]atalogd
               <code class="ph codeph">impalad</code> if the service started successfully.
 <pre class="pre codeblock"><code>$ sudo service impala-server start
 $ ps ax | grep [i]mpalad
- 7936 ?        Sl     0:12 /usr/lib/impala/sbin/impalad -log_dir=/var/log/impala -state_store_port=24000 -use_statestore
--state_store_host=127.0.0.1 -be_port=22000
+ 7936 ?        Sl     0:12 /usr/lib/impala/sbin/impalad -log_dir=/var/log/impala -state_store_port=24000
+ -state_store_host=127.0.0.1 -be_port=22000
 </code></pre>
             </li>
           </ol>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ca9005be/docs/build/html/topics/impala_views.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_views.html b/docs/build/html/topics/impala_views.html
index ca6595a..fa1dd0e 100644
--- a/docs/build/html/topics/impala_views.html
+++ b/docs/build/html/topics/impala_views.html
@@ -202,6 +202,13 @@ Query finished, fetching results ...
       </p>
 
     <p class="p">
+        The <code class="ph codeph">STRAIGHT_JOIN</code> hint affects the join order of table references in the query
+        block containing the hint. It does not affect the join order of nested queries, such as views,
+        inline views, or <code class="ph codeph">WHERE</code>-clause subqueries. To use this hint for performance
+        tuning of complex queries, apply the hint to all query blocks that need a fixed join order.
+      </p>
+
+    <p class="p">
         <strong class="ph b">Restrictions:</strong>
       </p>
 
@@ -274,6 +281,15 @@ Query finished, fetching results ...
 </code></pre>
       </div>
       </li>
+      <li class="li">
+        <p class="p">
+        The <code class="ph codeph">TABLESAMPLE</code> clause of the <code class="ph codeph">SELECT</code>
+        statement does not apply to a table reference derived from a view, a subquery,
+        or anything other than a real base table. This clause only works for tables
+        backed by HDFS or HDFS-like data files, therefore it does not apply to Kudu or
+        HBase tables.
+      </p>
+      </li>
     </ul>
 
     <p class="p">