You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by sa...@apache.org on 2018/04/23 17:38:53 UTC

[05/20] impala git commit: IMPALA-5310: [DOCS] Document TABLESAMPLE clause for COMPUTE STATS

IMPALA-5310: [DOCS] Document TABLESAMPLE clause for COMPUTE STATS

Change-Id: I214b63db391bd35562f5ea9091508005f83b2fcc
Reviewed-on: http://gerrit.cloudera.org:8080/8975
Reviewed-by: Alex Rodoni <ar...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/0ec3cd71
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/0ec3cd71
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/0ec3cd71

Branch: refs/heads/2.x
Commit: 0ec3cd71071ce623dcf9eb919dfca639f91a5bc7
Parents: 5f4d89f
Author: John Russell <jr...@cloudera.com>
Authored: Mon Jan 8 14:41:16 2018 -0800
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Thu Apr 19 22:10:21 2018 +0000

----------------------------------------------------------------------
 docs/topics/impala_compute_stats.xml | 30 ++++++++++++++++++++++--------
 docs/topics/impala_tablesample.xml   |  6 ++++++
 2 files changed, 28 insertions(+), 8 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/0ec3cd71/docs/topics/impala_compute_stats.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_compute_stats.xml b/docs/topics/impala_compute_stats.xml
index b62972c..95343f4 100644
--- a/docs/topics/impala_compute_stats.xml
+++ b/docs/topics/impala_compute_stats.xml
@@ -39,18 +39,20 @@ under the License.
   <conbody>
 
     <p>
-      <indexterm audience="hidden">COMPUTE STATS statement</indexterm>
-      Gathers information about volume and distribution of data in a table and all associated columns and
-      partitions. The information is stored in the metastore database, and used by Impala to help optimize queries.
-      For example, if Impala can determine that a table is large or small, or has many or few distinct values it
-      can organize parallelize the work appropriately for a join query or insert operation. For details about the
-      kinds of information gathered by this statement, see <xref href="impala_perf_stats.xml#perf_stats"/>.
+      <indexterm audience="hidden">COMPUTE STATS statement</indexterm> The
+      COMPUTE STATS statement gathers information about volume and distribution
+      of data in a table and all associated columns and partitions. The
+      information is stored in the metastore database, and used by Impala to
+      help optimize queries. For example, if Impala can determine that a table
+      is large or small, or has many or few distinct values it can organize and
+      parallelize the work appropriately for a join query or insert operation.
+      For details about the kinds of information gathered by this statement, see
+        <xref href="impala_perf_stats.xml#perf_stats"/>.
     </p>
 
     <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
 
-<codeblock rev="impala-3562">COMPUTE STATS
-  [<varname>db_name</varname>.]<varname>table_name</varname> [ ( <varname>column_list</varname> ) ]
+<codeblock rev="2.1.0"><ph rev="2.12.0 IMPALA-5310">COMPUTE STATS [<varname>db_name</varname>.]<varname>table_name</varname>  [ ( <varname>column_list</varname> ) ] [TABLESAMPLE SYSTEM(<varname>percentage</varname>) [REPEATABLE(<varname>seed</varname>)]]</ph>
 
 <varname>column_list</varname> ::= <varname>column_name</varname> [ , <varname>column_name</varname>, ... ]
 
@@ -104,6 +106,18 @@ COMPUTE INCREMENTAL STATS [<varname>db_name</varname>.]<varname>table_name</varn
         STATS</codeph>.
     </p>
 
+    <p rev="2.12.0 IMPALA-5310">
+      In <keyword keyref="impala212_full"/> and
+      higher, an optional <codeph>TABLESAMPLE</codeph> clause immediately after
+      a table reference specifies that the <codeph>COMPUTE STATS</codeph>
+      operation only processes a specified percentage of the table data. For
+      tables that are so large that a full <codeph>COMPUTE STATS</codeph>
+      operation is impractical, you can use <codeph>COMPUTE STATS</codeph> with
+      a <codeph>TABLESAMPLE</codeph> clause to extrapolate statistics from a
+      sample of the table data. See <keyword keyref="perf_stats"/>about the
+      experimental stats extrapolation and sampling features.
+    </p>
+
     <p rev="2.1.0">
       The <codeph>COMPUTE INCREMENTAL STATS</codeph> variation is a shortcut for partitioned tables that works on a
       subset of partitions rather than the entire table. The incremental nature makes it suitable for large tables

http://git-wip-us.apache.org/repos/asf/impala/blob/0ec3cd71/docs/topics/impala_tablesample.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_tablesample.xml b/docs/topics/impala_tablesample.xml
index f60c5be..e5123cb 100644
--- a/docs/topics/impala_tablesample.xml
+++ b/docs/topics/impala_tablesample.xml
@@ -81,6 +81,12 @@ under the License.
 
     <p conref="../shared/impala_common.xml#common/added_in_290"/>
 
+    <p rev="2.12.0 IMPALA-5310">
+      See <keyword keyref="compute_stats"/> for the
+        <codeph>TABLESAMPLE</codeph> clause used in the <codeph>COMPUTE
+        STATS</codeph> statement.
+    </p>
+
     <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
 
     <p>