You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by jo...@apache.org on 2019/07/11 21:36:00 UTC
[impala] 02/02: IMPALA-8729: [DOCS] Describe on-demand metadata
feature
This is an automated email from the ASF dual-hosted git repository.
joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
commit 8081045bebb253698bcd748a64ccff843e3e85ff
Author: Alex Rodoni <ar...@cloudera.com>
AuthorDate: Wed Jul 3 16:56:41 2019 -0700
IMPALA-8729: [DOCS] Describe on-demand metadata feature
- Overview of on-demand metadata.
- Config flags to enable/disable on-demand metadata.
Change-Id: I64261625c1d9b122c7cca59f9b004dda05810351
Reviewed-on: http://gerrit.cloudera.org:8080/13802
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Bharath Vissapragada <bh...@cloudera.com>
---
docs/topics/impala_metadata.xml | 98 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 98 insertions(+)
diff --git a/docs/topics/impala_metadata.xml b/docs/topics/impala_metadata.xml
index 9e6e1ee..e810a54 100644
--- a/docs/topics/impala_metadata.xml
+++ b/docs/topics/impala_metadata.xml
@@ -42,6 +42,104 @@ under the License.
</conbody>
+ <concept id="on_demand_metadata">
+
+ <title>On-demand Metadata</title>
+
+ <conbody>
+
+ <p>
+ In previous versions of Impala, every coordinator kept a replica of all the cache in
+ <codeph>catalogd</codeph>, consuming large memory on each coordinator with no option to
+ evict. Metadata always propagated through the <codeph>statestored</codeph> and suffers
+ from head-of-line blocking, for example, one user loading a big table blocking another
+ user loading a small table.
+ </p>
+
+ <p>
+ With this new feature, the coordinators pull metadata as needed from
+ <codeph>catalogd</codeph> and cache it locally. The cached metadata gets evicted
+ automatically under memory pressure.
+ </p>
+
+ <p>
+ The granularity of on-demand metadata fetches is now at the partition level between the
+ coordinator and <codeph>catalogd</codeph>. Common use cases like add/drop partitions do
+ not trigger unnecessary serialization/deserialization of large metadata.
+ </p>
+
+ <p>
+ This feature is disabled by default.
+ </p>
+
+ <p>
+ The feature can be used in either of the following modes.
+ <dl>
+ <dlentry>
+
+ <dt>
+ Metadata on-demand mode
+ </dt>
+
+ <dd>
+ In this mode, all coordinators use the metadata on-demand.
+ </dd>
+
+ <dd>
+ Set the following on <codeph>catalogd</codeph>:
+<codeblock>--catalog_topic_mode=minimal</codeblock>
+ </dd>
+
+ <dd>
+ Set the following on all <codeph>impalad</codeph> coordinators:
+<codeblock>--use_local_catalog=true</codeblock>
+ </dd>
+
+ </dlentry>
+
+ <dlentry>
+
+ <dt>
+ Mixed mode
+ </dt>
+
+ <dd>
+ In this mode, only some coordinators are enabled to use the metadata on-demand.
+ </dd>
+
+ <dd>
+ We recommend that you use the mixed mode only for testing local catalog’s impact
+ on heap usage.
+ </dd>
+
+ <dd>
+ Set the following on <codeph>catalogd</codeph>:
+<codeblock>--catalog_topic_mode=mixed</codeblock>
+ </dd>
+
+ <dd>
+ Set the following on <codeph>impalad</codeph> coordinators with metdadata
+ on-demand:
+<codeblock>--use_local_catalog=true </codeblock>
+ </dd>
+
+ </dlentry>
+ </dl>
+ </p>
+
+ <p>
+ <b>Limitation:</b>
+ </p>
+
+ <p>
+ Global <codeph>INVALIDATES</codeph> are not supported when this feature is enabled. If
+ your workload requires global <codeph>INVALIDATES</codeph>, do not use this feature.
+ </p>
+
+ </conbody>
+
+ </concept>
+
<concept id="auto_invalidate_metadata">
<title>Automatic Invalidation of Metadata Cache</title>