You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ta...@apache.org on 2018/04/19 18:12:14 UTC
[3/8] impala git commit: IMPALA-5310: [DOCS] Document TABLESAMPLE
clause for COMPUTE STATS
IMPALA-5310: [DOCS] Document TABLESAMPLE clause for COMPUTE STATS
Change-Id: I214b63db391bd35562f5ea9091508005f83b2fcc
Reviewed-on: http://gerrit.cloudera.org:8080/8975
Reviewed-by: Alex Rodoni <ar...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/ddc795d8
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/ddc795d8
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/ddc795d8
Branch: refs/heads/master
Commit: ddc795d86a909770490d28ed84156c42757c2821
Parents: bfb2131
Author: John Russell <jr...@cloudera.com>
Authored: Mon Jan 8 14:41:16 2018 -0800
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Thu Apr 19 03:06:04 2018 +0000
----------------------------------------------------------------------
docs/topics/impala_compute_stats.xml | 30 ++++++++++++++++++++++--------
docs/topics/impala_tablesample.xml | 6 ++++++
2 files changed, 28 insertions(+), 8 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/impala/blob/ddc795d8/docs/topics/impala_compute_stats.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_compute_stats.xml b/docs/topics/impala_compute_stats.xml
index b62972c..95343f4 100644
--- a/docs/topics/impala_compute_stats.xml
+++ b/docs/topics/impala_compute_stats.xml
@@ -39,18 +39,20 @@ under the License.
<conbody>
<p>
- <indexterm audience="hidden">COMPUTE STATS statement</indexterm>
- Gathers information about volume and distribution of data in a table and all associated columns and
- partitions. The information is stored in the metastore database, and used by Impala to help optimize queries.
- For example, if Impala can determine that a table is large or small, or has many or few distinct values it
- can organize parallelize the work appropriately for a join query or insert operation. For details about the
- kinds of information gathered by this statement, see <xref href="impala_perf_stats.xml#perf_stats"/>.
+ <indexterm audience="hidden">COMPUTE STATS statement</indexterm> The
+ COMPUTE STATS statement gathers information about volume and distribution
+ of data in a table and all associated columns and partitions. The
+ information is stored in the metastore database, and used by Impala to
+ help optimize queries. For example, if Impala can determine that a table
+ is large or small, or has many or few distinct values it can organize and
+ parallelize the work appropriately for a join query or insert operation.
+ For details about the kinds of information gathered by this statement, see
+ <xref href="impala_perf_stats.xml#perf_stats"/>.
</p>
<p conref="../shared/impala_common.xml#common/syntax_blurb"/>
-<codeblock rev="impala-3562">COMPUTE STATS
- [<varname>db_name</varname>.]<varname>table_name</varname> [ ( <varname>column_list</varname> ) ]
+<codeblock rev="2.1.0"><ph rev="2.12.0 IMPALA-5310">COMPUTE STATS [<varname>db_name</varname>.]<varname>table_name</varname> [ ( <varname>column_list</varname> ) ] [TABLESAMPLE SYSTEM(<varname>percentage</varname>) [REPEATABLE(<varname>seed</varname>)]]</ph>
<varname>column_list</varname> ::= <varname>column_name</varname> [ , <varname>column_name</varname>, ... ]
@@ -104,6 +106,18 @@ COMPUTE INCREMENTAL STATS [<varname>db_name</varname>.]<varname>table_name</varn
STATS</codeph>.
</p>
+ <p rev="2.12.0 IMPALA-5310">
+ In <keyword keyref="impala212_full"/> and
+ higher, an optional <codeph>TABLESAMPLE</codeph> clause immediately after
+ a table reference specifies that the <codeph>COMPUTE STATS</codeph>
+ operation only processes a specified percentage of the table data. For
+ tables that are so large that a full <codeph>COMPUTE STATS</codeph>
+ operation is impractical, you can use <codeph>COMPUTE STATS</codeph> with
+ a <codeph>TABLESAMPLE</codeph> clause to extrapolate statistics from a
+ sample of the table data. See <keyword keyref="perf_stats"/>about the
+ experimental stats extrapolation and sampling features.
+ </p>
+
<p rev="2.1.0">
The <codeph>COMPUTE INCREMENTAL STATS</codeph> variation is a shortcut for partitioned tables that works on a
subset of partitions rather than the entire table. The incremental nature makes it suitable for large tables
http://git-wip-us.apache.org/repos/asf/impala/blob/ddc795d8/docs/topics/impala_tablesample.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_tablesample.xml b/docs/topics/impala_tablesample.xml
index f60c5be..e5123cb 100644
--- a/docs/topics/impala_tablesample.xml
+++ b/docs/topics/impala_tablesample.xml
@@ -81,6 +81,12 @@ under the License.
<p conref="../shared/impala_common.xml#common/added_in_290"/>
+ <p rev="2.12.0 IMPALA-5310">
+ See <keyword keyref="compute_stats"/> for the
+ <codeph>TABLESAMPLE</codeph> clause used in the <codeph>COMPUTE
+ STATS</codeph> statement.
+ </p>
+
<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
<p>