You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ar...@apache.org on 2018/10/31 18:34:37 UTC

[2/4] impala git commit: IMPALA-7743: [DOCS] A new option to load incremental statistics from catalog

IMPALA-7743: [DOCS] A new option to load incremental statistics from catalog

--pull_incremental_statistics described in the Incremental Stats section.

Change-Id: I8fd9b88138350406065df2f39a48043178759949
Reviewed-on: http://gerrit.cloudera.org:8080/11790
Reviewed-by: Greg Rahn <gr...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/d4c0ce32
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/d4c0ce32
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/d4c0ce32

Branch: refs/heads/master
Commit: d4c0ce32a67a3f8d7fd4b8e92e42f6d4567d8db2
Parents: dcc4024
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Thu Oct 25 11:32:27 2018 -0700
Committer: Alex Rodoni <ar...@cloudera.com>
Committed: Wed Oct 31 00:25:45 2018 +0000

----------------------------------------------------------------------
 docs/shared/impala_common.xml     |  16 ++---
 docs/topics/impala_perf_stats.xml | 106 ++++++++++++++++++++-------------
 2 files changed, 75 insertions(+), 47 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/d4c0ce32/docs/shared/impala_common.xml
----------------------------------------------------------------------
diff --git a/docs/shared/impala_common.xml b/docs/shared/impala_common.xml
index a45f802..8b79596 100644
--- a/docs/shared/impala_common.xml
+++ b/docs/shared/impala_common.xml
@@ -1422,13 +1422,15 @@ drop database temp;
         for the first time on a given table.
       </p>
 
-      <p id="incremental_stats_caveats">
-        For a table with a huge number of partitions and many columns, the approximately 400 bytes
-        of metadata per column per partition can add up to significant memory overhead, as it must
-        be cached on the <cmdname>catalogd</cmdname> host and on every <cmdname>impalad</cmdname> host
-        that is eligible to be a coordinator. If this metadata for all tables combined exceeds 2 GB,
-        you might experience service downtime.
-      </p>
+      <p id="incremental_stats_caveats"> In Impala 3.0 and lower, approximately
+        400 bytes of metadata per column per partition are needed for caching.
+        Tables with a big number of partitions and many columns can add up to a
+        significant memory overhead as the metadata must be cached on the
+          <cmdname>catalogd</cmdname> host and on every
+          <cmdname>impalad</cmdname> host that is eligible to be a coordinator.
+        If this metadata for all tables exceeds 2 GB, you might experience
+        service downtime. In Impala 3.1 and higher, the issue was alleviated
+        with an improved handling of incremental stats.</p>
 
       <p id="incremental_partition_spec">
         The <codeph>PARTITION</codeph> clause is only allowed in combination with the <codeph>INCREMENTAL</codeph>

http://git-wip-us.apache.org/repos/asf/impala/blob/d4c0ce32/docs/topics/impala_perf_stats.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_perf_stats.xml b/docs/topics/impala_perf_stats.xml
index 15a00f7..861aba3 100644
--- a/docs/topics/impala_perf_stats.xml
+++ b/docs/topics/impala_perf_stats.xml
@@ -581,8 +581,10 @@ show column stats year_month_day;
         </p>
 
         <note type="important">
-          <p conref="../shared/impala_common.xml#common/incremental_stats_caveats"/>
-          <p conref="../shared/impala_common.xml#common/incremental_stats_after_full"/>
+          <p
+            conref="../shared/impala_common.xml#common/incremental_stats_caveats"/>
+          <p
+            conref="../shared/impala_common.xml#common/incremental_stats_after_full"/>
         </note>
 
         <p>
@@ -629,12 +631,13 @@ show column stats year_month_day;
           <li>
             <p>
               <codeph>COMPUTE INCREMENTAL STATS</codeph> uses some memory in the
-              <cmdname>catalogd</cmdname> process, proportional to the number of partitions and
-              number of columns in the applicable table. The memory overhead is approximately 400
-              bytes for each column in each partition. This memory is reserved in the
-              <cmdname>catalogd</cmdname> daemon, the <cmdname>statestored</cmdname> daemon, and
-              in each instance of the <cmdname>impalad</cmdname> daemon.
-            </p>
+                <cmdname>catalogd</cmdname> process, proportional to the number
+              of partitions and number of columns in the applicable table. The
+              memory overhead is approximately 400 bytes for each column in each
+              partition. This memory is reserved in the
+                <cmdname>catalogd</cmdname> daemon, the
+                <cmdname>statestored</cmdname> daemon, and in each instance of
+              the impalad daemon. </p>
           </li>
 
           <li>
@@ -705,42 +708,66 @@ show column stats year_month_day;
       <concept id="inc_stats_size_limit_bytes">
         <title>Maximum Serialized Stats Size</title>
         <conbody>
-          <p>
-            When executing <codeph>COMPUTE INCREMENTAL STATS</codeph> on
-            very large tables, use the configuration setting
-              <codeph>inc_stats_size_limit_bytes</codeph> to prevent Impala from
-            running out of memory while updating table metadata. If this limit
-            is reached, Impala will stop loading the table and return an error.
-            The error serves as an indication that <codeph>COMPUTE INCREMENTAL
-              STATS</codeph> should not be used on the particular table.
-            Consider spitting the table and using regular <codeph>COMPUTE
-              STATS</codeph> ]if possible.
-          </p>
-
-          <p>
-            The <codeph>inc_stats_size_limit_bytes</codeph> limit is set as a
-            safety check, to prevent Impala from hitting the maximum limit for
+          <p>In Impala 3.0 and lower, when executing <codeph>COMPUTE INCREMENTAL
+              STATS</codeph> on very large tables, use the configuration setting
+              <codeph>--inc_stats_size_limit_bytes</codeph> to prevent Impala
+            from running out of memory while updating table metadata. If this
+            limit is reached, Impala will stop loading the table and return an
+            error. The error serves as an indication that <codeph>COMPUTE
+              INCREMENTAL STATS</codeph> should not be used on the particular
+            table. Consider spitting the table and using regular <codeph>COMPUTE
+              STATS</codeph> ]if possible. </p>
+
+          <p> The <codeph>--inc_stats_size_limit_bytes</codeph> limit is set as
+            a safety check, to prevent Impala from hitting the maximum limit for
             the table metadata. Note that this limit is only one part of the
-            entire table's metadata all of which together must be below 2 GB.
-          </p>
+            entire table's metadata all of which together must be below 2 GB. </p>
 
-          <p>
-            The default value for <codeph>inc_stats_size_limit_bytes</codeph>
-            is 209715200, 200 MB.
-          </p>
+          <p> The default value for
+              <codeph>--inc_stats_size_limit_bytes</codeph> is 209715200, 200
+            MB. </p>
 
-          <p> To change the <codeph>inc_stats_size_limit_bytes</codeph> value,
-            restart <codeph>impalad</codeph> and <codeph>catalogd</codeph> with
-            the new value specified in bytes, for example, 1048576000 for 1 GB.
-            See <xref href="impala_config_options.xml#config_options"/> for the
-            steps to change the option and restart Impala daemons. </p>
+          <p> To change the <codeph>--inc_stats_size_limit_bytes</codeph> value,
+            restart impalad and catalogd with the new value specified in bytes,
+            for example, 1048576000 for 1 GB. See <xref
+              href="impala_config_options.xml#config_options"/> for the steps to
+            change the option and restart Impala daemons. </p>
 
-          <note type="attention">
-            The <codeph>inc_stats_size_limit_bytes</codeph> setting should be
+          <note type="attention"> The
+              <codeph>--inc_stats_size_limit_bytes</codeph> setting should be
             increased with care. A big value for the setting, such as 1 GB or
             more, can result in a spike in heap usage as well as a crash of
-            Impala.
-          </note>
+            Impala. </note>
+          <p>In Impala 3.1 and higher, Impala improved how metadata is updated
+            when executing <codeph>COMPUTE INCREMENTAL STATS</codeph>,
+            significantly reducing the need for
+              <codeph>--inc_stats_size_limit_bytes</codeph>. </p>
+        </conbody>
+      </concept>
+      <concept id="pull_incremental_statistics">
+        <title>Loading Incremental Statistics from Catalogd</title>
+        <conbody>
+          <p>
+            Starting in Impala 3.1, a new configuration setting,
+              <codeph>--pull_incremental_statistics</codeph>, was added and set
+            to <codeph>true</codeph> by default. When you start Impala catalogd
+            and impalad coordinators with this setting enabled:
+          </p>
+          <ul>
+            <li> Newly created incremental stats will be smaller in size thus
+              reducing memory pressure on the catalogd daemon. Your users can
+              keep more tables and partitions in the same catalog and have lower
+              chances of crashing catalogd due to out-of-memory issues. </li>
+            <li>
+              Incremental stats will not be replicated to impalad and will be
+              accessed on demand from catalogd, resulting in a reduced memory
+              footprint of impalad.
+            </li>
+          </ul>
+          <p>
+            We do not recommend you change the default setting of
+              <codeph>--pull_incremental_statistics</codeph>.
+          </p>
         </conbody>
       </concept>
 
@@ -980,8 +1007,7 @@ alter table <varname>table_name</varname> partition (<varname>keycol1</varname>=
           frequently enough to keep up with data changes for a huge table.
         </p>
 
-        <p conref="../shared/impala_common.xml#common/set_column_stats_example"
-        />
+        <p conref="../shared/impala_common.xml#common/set_column_stats_example"/>
 
       </conbody>