You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by st...@apache.org on 2023/02/26 01:39:55 UTC
[impala] 01/02: IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg
This is an automated email from the ASF dual-hosted git repository.
stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
commit fa64be7cc7074f201fff1eccc9cbf19520a19c55
Author: Riza Suminto <ri...@cloudera.com>
AuthorDate: Thu Feb 23 16:05:31 2023 -0800
IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg
IMPALA-11658 implements Iceberg manifest caching for Impala. This patch
adds documentation for configuring the cache(s).
Testing:
- Built docs locally
Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe
Reviewed-on: http://gerrit.cloudera.org:8080/19530
Reviewed-by: Daniel Becker <da...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <bo...@cloudera.com>
---
docs/topics/impala_iceberg.xml | 60 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 60 insertions(+)
diff --git a/docs/topics/impala_iceberg.xml b/docs/topics/impala_iceberg.xml
index 32363f3de..62abca615 100644
--- a/docs/topics/impala_iceberg.xml
+++ b/docs/topics/impala_iceberg.xml
@@ -606,4 +606,64 @@ ALTER TABLE ice_tbl EXECUTE expire_snapshots(now() - interval 5 days);
</p>
</conbody>
</concept>
+
+ <concept id="iceberg_manifest_caching">
+ <title>Iceberg manifest caching</title>
+ <conbody>
+ <p>
+ Starting from version 1.1.0, Apache Iceberg provides a mechanism to cache the
+ contents of Iceberg manifest files in memory. This manifest caching feature helps
+ to reduce repeated reads of small Iceberg manifest files from remote storage by
+ Coordinators and Catalogd. This feature can be enabled for Impala Coordinators and
+ Catalogd by setting properties in Hadoop's core-site.xml as in the following:
+ <codeblock>
+iceberg.io-impl=org.apache.iceberg.hadoop.HadoopFileIO;
+iceberg.io.manifest.cache-enabled=true;
+iceberg.io.manifest.cache.max-total-bytes=104857600;
+iceberg.io.manifest.cache.expiration-interval-ms=3600000;
+iceberg.io.manifest.cache.max-content-length=8388608;
+ </codeblock>
+ </p>
+ <p>
+ The description of each property is as follows:
+ <ul>
+ <li>
+ <codeph>iceberg.io-impl</codeph>: custom FileIO implementation to use in a
+ catalog. Must be set to enable manifest caching. Impala defaults to
+ HadoopFileIO. It is recommended to not change this to other than HadoopFileIO.
+ </li>
+ <li>
+ <codeph>iceberg.io.manifest.cache-enabled</codeph>: enable/disable the
+ manifest caching feature.
+ </li>
+ <li>
+ <codeph>iceberg.io.manifest.cache.max-total-bytes</codeph>: maximum total
+ amount of bytes to cache in the manifest cache. Must be a positive value.
+ </li>
+ <li>
+ <codeph>iceberg.io.manifest.cache.expiration-interval-ms</codeph>: maximum
+ duration for which an entry stays in the manifest cache. Must be a
+ non-negative value. Setting zero means cache entries expire only if it gets
+ evicted due to memory pressure from
+ <codeph>iceberg.io.manifest.cache.max-total-bytes</codeph>.
+ </li>
+ <li>
+ <codeph>iceberg.io.manifest.cache.max-content-length</codeph>: maximum length
+ of a manifest file to be considered for caching in bytes. Manifest files with
+ a length exceeding this property value will not be cached. Must be set with a
+ positive value and lower than
+ <codeph>iceberg.io.manifest.cache.max-total-bytes</codeph>.
+ </li>
+ </ul>
+ </p>
+ <p>
+ Manifest caching only works for tables that are loaded with either of
+ HadoopCatalogs or HiveCatalogs. Individual HadoopCatalog and HiveCatalog will have
+ separate manifest caches with the same configuration. By default, only 8 catalogs
+ can have their manifest cache active in memory. This number can be raised by
+ setting a higher value in the java system property
+ <codeph>iceberg.io.manifest.cache.fileio-max</codeph>.
+ </p>
+ </conbody>
+ </concept>
</concept>