You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ar...@apache.org on 2018/10/31 18:34:37 UTC
[2/4] impala git commit: IMPALA-7743: [DOCS] A new option to load
incremental statistics from catalog
IMPALA-7743: [DOCS] A new option to load incremental statistics from catalog
--pull_incremental_statistics described in the Incremental Stats section.
Change-Id: I8fd9b88138350406065df2f39a48043178759949
Reviewed-on: http://gerrit.cloudera.org:8080/11790
Reviewed-by: Greg Rahn <gr...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/d4c0ce32
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/d4c0ce32
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/d4c0ce32
Branch: refs/heads/master
Commit: d4c0ce32a67a3f8d7fd4b8e92e42f6d4567d8db2
Parents: dcc4024
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Thu Oct 25 11:32:27 2018 -0700
Committer: Alex Rodoni <ar...@cloudera.com>
Committed: Wed Oct 31 00:25:45 2018 +0000
----------------------------------------------------------------------
docs/shared/impala_common.xml | 16 ++---
docs/topics/impala_perf_stats.xml | 106 ++++++++++++++++++++-------------
2 files changed, 75 insertions(+), 47 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/impala/blob/d4c0ce32/docs/shared/impala_common.xml
----------------------------------------------------------------------
diff --git a/docs/shared/impala_common.xml b/docs/shared/impala_common.xml
index a45f802..8b79596 100644
--- a/docs/shared/impala_common.xml
+++ b/docs/shared/impala_common.xml
@@ -1422,13 +1422,15 @@ drop database temp;
for the first time on a given table.
</p>
- <p id="incremental_stats_caveats">
- For a table with a huge number of partitions and many columns, the approximately 400 bytes
- of metadata per column per partition can add up to significant memory overhead, as it must
- be cached on the <cmdname>catalogd</cmdname> host and on every <cmdname>impalad</cmdname> host
- that is eligible to be a coordinator. If this metadata for all tables combined exceeds 2 GB,
- you might experience service downtime.
- </p>
+ <p id="incremental_stats_caveats"> In Impala 3.0 and lower, approximately
+ 400 bytes of metadata per column per partition are needed for caching.
+ Tables with a big number of partitions and many columns can add up to a
+ significant memory overhead as the metadata must be cached on the
+ <cmdname>catalogd</cmdname> host and on every
+ <cmdname>impalad</cmdname> host that is eligible to be a coordinator.
+ If this metadata for all tables exceeds 2 GB, you might experience
+ service downtime. In Impala 3.1 and higher, the issue was alleviated
+ with an improved handling of incremental stats.</p>
<p id="incremental_partition_spec">
The <codeph>PARTITION</codeph> clause is only allowed in combination with the <codeph>INCREMENTAL</codeph>
http://git-wip-us.apache.org/repos/asf/impala/blob/d4c0ce32/docs/topics/impala_perf_stats.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_perf_stats.xml b/docs/topics/impala_perf_stats.xml
index 15a00f7..861aba3 100644
--- a/docs/topics/impala_perf_stats.xml
+++ b/docs/topics/impala_perf_stats.xml
@@ -581,8 +581,10 @@ show column stats year_month_day;
</p>
<note type="important">
- <p conref="../shared/impala_common.xml#common/incremental_stats_caveats"/>
- <p conref="../shared/impala_common.xml#common/incremental_stats_after_full"/>
+ <p
+ conref="../shared/impala_common.xml#common/incremental_stats_caveats"/>
+ <p
+ conref="../shared/impala_common.xml#common/incremental_stats_after_full"/>
</note>
<p>
@@ -629,12 +631,13 @@ show column stats year_month_day;
<li>
<p>
<codeph>COMPUTE INCREMENTAL STATS</codeph> uses some memory in the
- <cmdname>catalogd</cmdname> process, proportional to the number of partitions and
- number of columns in the applicable table. The memory overhead is approximately 400
- bytes for each column in each partition. This memory is reserved in the
- <cmdname>catalogd</cmdname> daemon, the <cmdname>statestored</cmdname> daemon, and
- in each instance of the <cmdname>impalad</cmdname> daemon.
- </p>
+ <cmdname>catalogd</cmdname> process, proportional to the number
+ of partitions and number of columns in the applicable table. The
+ memory overhead is approximately 400 bytes for each column in each
+ partition. This memory is reserved in the
+ <cmdname>catalogd</cmdname> daemon, the
+ <cmdname>statestored</cmdname> daemon, and in each instance of
+ the impalad daemon. </p>
</li>
<li>
@@ -705,42 +708,66 @@ show column stats year_month_day;
<concept id="inc_stats_size_limit_bytes">
<title>Maximum Serialized Stats Size</title>
<conbody>
- <p>
- When executing <codeph>COMPUTE INCREMENTAL STATS</codeph> on
- very large tables, use the configuration setting
- <codeph>inc_stats_size_limit_bytes</codeph> to prevent Impala from
- running out of memory while updating table metadata. If this limit
- is reached, Impala will stop loading the table and return an error.
- The error serves as an indication that <codeph>COMPUTE INCREMENTAL
- STATS</codeph> should not be used on the particular table.
- Consider spitting the table and using regular <codeph>COMPUTE
- STATS</codeph> ]if possible.
- </p>
-
- <p>
- The <codeph>inc_stats_size_limit_bytes</codeph> limit is set as a
- safety check, to prevent Impala from hitting the maximum limit for
+ <p>In Impala 3.0 and lower, when executing <codeph>COMPUTE INCREMENTAL
+ STATS</codeph> on very large tables, use the configuration setting
+ <codeph>--inc_stats_size_limit_bytes</codeph> to prevent Impala
+ from running out of memory while updating table metadata. If this
+ limit is reached, Impala will stop loading the table and return an
+ error. The error serves as an indication that <codeph>COMPUTE
+ INCREMENTAL STATS</codeph> should not be used on the particular
+ table. Consider spitting the table and using regular <codeph>COMPUTE
+ STATS</codeph> ]if possible. </p>
+
+ <p> The <codeph>--inc_stats_size_limit_bytes</codeph> limit is set as
+ a safety check, to prevent Impala from hitting the maximum limit for
the table metadata. Note that this limit is only one part of the
- entire table's metadata all of which together must be below 2 GB.
- </p>
+ entire table's metadata all of which together must be below 2 GB. </p>
- <p>
- The default value for <codeph>inc_stats_size_limit_bytes</codeph>
- is 209715200, 200 MB.
- </p>
+ <p> The default value for
+ <codeph>--inc_stats_size_limit_bytes</codeph> is 209715200, 200
+ MB. </p>
- <p> To change the <codeph>inc_stats_size_limit_bytes</codeph> value,
- restart <codeph>impalad</codeph> and <codeph>catalogd</codeph> with
- the new value specified in bytes, for example, 1048576000 for 1 GB.
- See <xref href="impala_config_options.xml#config_options"/> for the
- steps to change the option and restart Impala daemons. </p>
+ <p> To change the <codeph>--inc_stats_size_limit_bytes</codeph> value,
+ restart impalad and catalogd with the new value specified in bytes,
+ for example, 1048576000 for 1 GB. See <xref
+ href="impala_config_options.xml#config_options"/> for the steps to
+ change the option and restart Impala daemons. </p>
- <note type="attention">
- The <codeph>inc_stats_size_limit_bytes</codeph> setting should be
+ <note type="attention"> The
+ <codeph>--inc_stats_size_limit_bytes</codeph> setting should be
increased with care. A big value for the setting, such as 1 GB or
more, can result in a spike in heap usage as well as a crash of
- Impala.
- </note>
+ Impala. </note>
+ <p>In Impala 3.1 and higher, Impala improved how metadata is updated
+ when executing <codeph>COMPUTE INCREMENTAL STATS</codeph>,
+ significantly reducing the need for
+ <codeph>--inc_stats_size_limit_bytes</codeph>. </p>
+ </conbody>
+ </concept>
+ <concept id="pull_incremental_statistics">
+ <title>Loading Incremental Statistics from Catalogd</title>
+ <conbody>
+ <p>
+ Starting in Impala 3.1, a new configuration setting,
+ <codeph>--pull_incremental_statistics</codeph>, was added and set
+ to <codeph>true</codeph> by default. When you start Impala catalogd
+ and impalad coordinators with this setting enabled:
+ </p>
+ <ul>
+ <li> Newly created incremental stats will be smaller in size thus
+ reducing memory pressure on the catalogd daemon. Your users can
+ keep more tables and partitions in the same catalog and have lower
+ chances of crashing catalogd due to out-of-memory issues. </li>
+ <li>
+ Incremental stats will not be replicated to impalad and will be
+ accessed on demand from catalogd, resulting in a reduced memory
+ footprint of impalad.
+ </li>
+ </ul>
+ <p>
+ We do not recommend you change the default setting of
+ <codeph>--pull_incremental_statistics</codeph>.
+ </p>
</conbody>
</concept>
@@ -980,8 +1007,7 @@ alter table <varname>table_name</varname> partition (<varname>keycol1</varname>=
frequently enough to keep up with data changes for a huge table.
</p>
- <p conref="../shared/impala_common.xml#common/set_column_stats_example"
- />
+ <p conref="../shared/impala_common.xml#common/set_column_stats_example"/>
</conbody>