You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by st...@apache.org on 2023/02/26 01:39:55 UTC

[impala] 01/02: IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit fa64be7cc7074f201fff1eccc9cbf19520a19c55
Author: Riza Suminto <ri...@cloudera.com>
AuthorDate: Thu Feb 23 16:05:31 2023 -0800

    IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg
    
    IMPALA-11658 implements Iceberg manifest caching for Impala. This patch
    adds documentation for configuring the cache(s).
    
    Testing:
    - Built docs locally
    
    Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe
    Reviewed-on: http://gerrit.cloudera.org:8080/19530
    Reviewed-by: Daniel Becker <da...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Zoltan Borok-Nagy <bo...@cloudera.com>
---
 docs/topics/impala_iceberg.xml | 60 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/docs/topics/impala_iceberg.xml b/docs/topics/impala_iceberg.xml
index 32363f3de..62abca615 100644
--- a/docs/topics/impala_iceberg.xml
+++ b/docs/topics/impala_iceberg.xml
@@ -606,4 +606,64 @@ ALTER TABLE ice_tbl EXECUTE expire_snapshots(now() - interval 5 days);
       </p>
     </conbody>
   </concept>
+
+  <concept id="iceberg_manifest_caching">
+    <title>Iceberg manifest caching</title>
+    <conbody>
+      <p>
+        Starting from version 1.1.0, Apache Iceberg provides a mechanism to cache the
+        contents of Iceberg manifest files in memory. This manifest caching feature helps
+        to reduce repeated reads of small Iceberg manifest files from remote storage by
+        Coordinators and Catalogd. This feature can be enabled for Impala Coordinators and
+        Catalogd by setting properties in Hadoop's core-site.xml as in the following:
+        <codeblock>
+iceberg.io-impl=org.apache.iceberg.hadoop.HadoopFileIO;
+iceberg.io.manifest.cache-enabled=true;
+iceberg.io.manifest.cache.max-total-bytes=104857600;
+iceberg.io.manifest.cache.expiration-interval-ms=3600000;
+iceberg.io.manifest.cache.max-content-length=8388608;
+        </codeblock>
+      </p>
+      <p>
+        The description of each property is as follows:
+        <ul>
+          <li>
+            <codeph>iceberg.io-impl</codeph>: custom FileIO implementation to use in a
+            catalog. Must be set to enable manifest caching. Impala defaults to
+            HadoopFileIO. It is recommended to not change this to other than HadoopFileIO.
+          </li>
+          <li>
+            <codeph>iceberg.io.manifest.cache-enabled</codeph>: enable/disable the
+            manifest caching feature.
+          </li>
+          <li>
+            <codeph>iceberg.io.manifest.cache.max-total-bytes</codeph>: maximum total
+            amount of bytes to cache in the manifest cache. Must be a positive value.
+          </li>
+          <li>
+            <codeph>iceberg.io.manifest.cache.expiration-interval-ms</codeph>: maximum
+            duration for which an entry stays in the manifest cache. Must be a
+            non-negative value. Setting zero means cache entries expire only if it gets
+            evicted due to memory pressure from
+            <codeph>iceberg.io.manifest.cache.max-total-bytes</codeph>.
+          </li>
+          <li>
+            <codeph>iceberg.io.manifest.cache.max-content-length</codeph>: maximum length
+            of a manifest file to be considered for caching in bytes. Manifest files with
+            a length exceeding this property value will not be cached. Must be set with a
+            positive value and lower than
+            <codeph>iceberg.io.manifest.cache.max-total-bytes</codeph>.
+          </li>
+        </ul>
+      </p>
+      <p>
+        Manifest caching only works for tables that are loaded with either of
+        HadoopCatalogs or HiveCatalogs. Individual HadoopCatalog and HiveCatalog will have
+        separate manifest caches with the same configuration. By default, only 8 catalogs
+        can have their manifest cache active in memory. This number can be raised by
+        setting a higher value in the java system property
+        <codeph>iceberg.io.manifest.cache.fileio-max</codeph>.
+      </p>
+    </conbody>
+  </concept>
 </concept>