You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by st...@apache.org on 2023/02/26 01:39:54 UTC
[impala] branch master updated (f54b3c375 -> d0592c0db)
This is an automated email from the ASF dual-hosted git repository.
stigahuang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
from f54b3c375 IMPALA-11713: Switch to C++17
new fa64be7cc IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg
new d0592c0db IMPALA-11945: Fix Flaky Test in JwtHttpTest
The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
docs/topics/impala_iceberg.xml | 60 ++++++++++++++++
.../apache/impala/customcluster/JwtHttpTest.java | 83 +++++++++++++++++-----
2 files changed, 125 insertions(+), 18 deletions(-)
[impala] 01/02: IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg
Posted by st...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
commit fa64be7cc7074f201fff1eccc9cbf19520a19c55
Author: Riza Suminto <ri...@cloudera.com>
AuthorDate: Thu Feb 23 16:05:31 2023 -0800
IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg
IMPALA-11658 implements Iceberg manifest caching for Impala. This patch
adds documentation for configuring the cache(s).
Testing:
- Built docs locally
Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe
Reviewed-on: http://gerrit.cloudera.org:8080/19530
Reviewed-by: Daniel Becker <da...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <bo...@cloudera.com>
---
docs/topics/impala_iceberg.xml | 60 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 60 insertions(+)
diff --git a/docs/topics/impala_iceberg.xml b/docs/topics/impala_iceberg.xml
index 32363f3de..62abca615 100644
--- a/docs/topics/impala_iceberg.xml
+++ b/docs/topics/impala_iceberg.xml
@@ -606,4 +606,64 @@ ALTER TABLE ice_tbl EXECUTE expire_snapshots(now() - interval 5 days);
</p>
</conbody>
</concept>
+
+ <concept id="iceberg_manifest_caching">
+ <title>Iceberg manifest caching</title>
+ <conbody>
+ <p>
+ Starting from version 1.1.0, Apache Iceberg provides a mechanism to cache the
+ contents of Iceberg manifest files in memory. This manifest caching feature helps
+ to reduce repeated reads of small Iceberg manifest files from remote storage by
+ Coordinators and Catalogd. This feature can be enabled for Impala Coordinators and
+ Catalogd by setting properties in Hadoop's core-site.xml as in the following:
+ <codeblock>
+iceberg.io-impl=org.apache.iceberg.hadoop.HadoopFileIO;
+iceberg.io.manifest.cache-enabled=true;
+iceberg.io.manifest.cache.max-total-bytes=104857600;
+iceberg.io.manifest.cache.expiration-interval-ms=3600000;
+iceberg.io.manifest.cache.max-content-length=8388608;
+ </codeblock>
+ </p>
+ <p>
+ The description of each property is as follows:
+ <ul>
+ <li>
+ <codeph>iceberg.io-impl</codeph>: custom FileIO implementation to use in a
+ catalog. Must be set to enable manifest caching. Impala defaults to
+ HadoopFileIO. It is recommended to not change this to other than HadoopFileIO.
+ </li>
+ <li>
+ <codeph>iceberg.io.manifest.cache-enabled</codeph>: enable/disable the
+ manifest caching feature.
+ </li>
+ <li>
+ <codeph>iceberg.io.manifest.cache.max-total-bytes</codeph>: maximum total
+ amount of bytes to cache in the manifest cache. Must be a positive value.
+ </li>
+ <li>
+ <codeph>iceberg.io.manifest.cache.expiration-interval-ms</codeph>: maximum
+ duration for which an entry stays in the manifest cache. Must be a
+ non-negative value. Setting zero means cache entries expire only if it gets
+ evicted due to memory pressure from
+ <codeph>iceberg.io.manifest.cache.max-total-bytes</codeph>.
+ </li>
+ <li>
+ <codeph>iceberg.io.manifest.cache.max-content-length</codeph>: maximum length
+ of a manifest file to be considered for caching in bytes. Manifest files with
+ a length exceeding this property value will not be cached. Must be set with a
+ positive value and lower than
+ <codeph>iceberg.io.manifest.cache.max-total-bytes</codeph>.
+ </li>
+ </ul>
+ </p>
+ <p>
+ Manifest caching only works for tables that are loaded with either of
+ HadoopCatalogs or HiveCatalogs. Individual HadoopCatalog and HiveCatalog will have
+ separate manifest caches with the same configuration. By default, only 8 catalogs
+ can have their manifest cache active in memory. This number can be raised by
+ setting a higher value in the java system property
+ <codeph>iceberg.io.manifest.cache.fileio-max</codeph>.
+ </p>
+ </conbody>
+ </concept>
</concept>
[impala] 02/02: IMPALA-11945: Fix Flaky Test in JwtHttpTest
Posted by st...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
commit d0592c0dbf4d634cc26b2e9872a9f62af5249cdd
Author: jasonmfehr <jf...@cloudera.com>
AuthorDate: Fri Feb 24 12:54:03 2023 -0800
IMPALA-11945: Fix Flaky Test in JwtHttpTest
The JwtHttpTest.testJwtAuthWithUntrustedJwksHttpsUrl test has been
flaky in the impala-asf-master-core Jenkins build. This build was
executed twice with this test passing in one run and failing in the
other.
This test and another test named
testJwtAuthWithTrustedJwksHttpsUrlInvalidCN both follow the same
pattern. They attempt to start up a cluster with incorrectly
configured Impala daemons. Then, the Impala coordinator's log file
is searched to ensure the coordinator failed to start up for the
expected reason. Both these tests were introduced by IMPALA-11922.
The theory for the test flakiness is that the Impala coordinator logs
did not finish writing to disk before they were checked for the
expected startup failure message. The evidence backing this theory
is that all expected log messages were present in the log file except
for the final log line containing the error message.
Three changes were made to fix the flakiness
1. The tests where log files are search for startup failure messages
now use a single coordinator. Only a single coordinator is needed
thus this change eliminates potential issues caused by multiple
coordinators.
2. The tests sleep 5 seconds before searching the log files to give
time for the logs to be fully written to disk.
3. Log buffering in the Impala daemons was turned off by setting the
logbuflevel startup flag to -1 which turns off in-memory log
buffering and writes the logs directly to disk.
Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67
Reviewed-on: http://gerrit.cloudera.org:8080/19536
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
.../apache/impala/customcluster/JwtHttpTest.java | 83 +++++++++++++++++-----
1 file changed, 65 insertions(+), 18 deletions(-)
diff --git a/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java b/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java
index 4a72c3e39..82fba0618 100644
--- a/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java
+++ b/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java
@@ -36,6 +36,7 @@ import org.apache.hive.service.rpc.thrift.*;
import org.apache.impala.testutil.WebClient;
import org.apache.impala.testutil.X509CertChain;
import org.apache.thrift.transport.THttpClient;
+import org.hamcrest.Matcher;
import org.apache.thrift.protocol.TBinaryProtocol;
import org.junit.After;
import org.junit.Test;
@@ -51,6 +52,7 @@ public class JwtHttpTest {
private static final String CA_CERT = "cacert.pem";
private static final String SERVER_CERT = "server-cert.pem";
private static final String SERVER_KEY = "server-key.pem";
+ private static final String JWKS_FILE_NAME = "jwks_rs256.json";
WebClient client_ = new WebClient();
@@ -89,13 +91,34 @@ public class JwtHttpTest {
private void setUp(String impaladArgs, String catalogdArgs,
String statestoredArgs, int expectedRetCode)
throws Exception {
- if (createJWKSForWebServer_) createTempJWKSInWebServerRootDir("jwks_rs256.json");
+ if (createJWKSForWebServer_) createTempJWKSInWebServerRootDir(JWKS_FILE_NAME);
int ret = CustomClusterRunner.StartImpalaCluster(
impaladArgs, catalogdArgs, statestoredArgs);
assertEquals(expectedRetCode, ret);
}
+ /**
+ * Helper method to start a JWT auth enabled Impala cluster that has only a single
+ * coordinator daemon process.
+ *
+ * @param impaladArgs startup flags to send to the impala coordinator/executors
+ * @param catalogdArgs startup flags to send to the impala catalog
+ * @param statestoredArgs startup flags to send to the statestore
+ * @param expectedRetCode expected exit code for the start impala cluster command,
+ * if the cluster is expected to start successfully, set to 0
+ */
+ private void setUpWithSingleCoordinator(String impaladArgs, String catalogdArgs,
+ String statestoredArgs, int expectedRetCode)
+ throws Exception {
+ if (createJWKSForWebServer_) createTempJWKSInWebServerRootDir(JWKS_FILE_NAME);
+
+ int ret = CustomClusterRunner.StartImpalaCluster(
+ impaladArgs, catalogdArgs, statestoredArgs, new HashMap<String, String>(),
+ "--num_coordinators=1");
+ assertEquals(expectedRetCode, ret);
+ }
+
@After
public void cleanUp() throws Exception {
// Leave a cluster running with the default configuration, then delete temporary
@@ -178,7 +201,8 @@ public class JwtHttpTest {
public void testJwtAuth() throws Exception {
createJWKSForWebServer_ = false;
String jwksFilename =
- new File(System.getenv("IMPALA_HOME"), "testdata/jwt/jwks_rs256.json").getPath();
+ new File(System.getenv("IMPALA_HOME"),
+ String.format("testdata/jwt/%s", JWKS_FILE_NAME)).getPath();
setUp(String.format(
"--jwt_token_auth=true --jwt_validate_signature=true --jwks_file_path=%s "
+ "--jwt_allow_without_tls=true",
@@ -404,7 +428,7 @@ public class JwtHttpTest {
String jwksHttpUrl = "https://localhost:25010/www/temp_jwks.json";
String impaladJwtArgs = String.format("--jwt_token_auth=true "
+ "--jwt_validate_signature=true --jwks_url=%s "
- + "--jwt_allow_without_tls=true --log_dir=%s ",
+ + "--jwt_allow_without_tls=true --log_dir=%s --logbuflevel=-1 ",
jwksHttpUrl, logDir.toAbsolutePath());
String expectedErrString = String.format("Impalad services did not start correctly, "
+ "exiting. Error: Error downloading JWKS from '%s': Network error: curl "
@@ -413,14 +437,9 @@ public class JwtHttpTest {
// cluster start will fail because the TLS cert returned by the
// JWKS server is not trusted
- setUp(impaladJwtArgs, "", statestoreWebserverArgs, 1);
-
- // check in the impalad logs that the server startup failed for the expected reason
- List<String> logLines = Files.readAllLines(logDir.resolve("impalad.ERROR"));
+ setUpWithSingleCoordinator(impaladJwtArgs, "", statestoreWebserverArgs, 1);
- assertThat(String.format("Impalad startup failed but not for the expected reason. "
- + "See logs in the '%s' folder for details.", logDir), logLines,
- hasItem(containsString(expectedErrString)));
+ checkCoordinatorLogs(expectedErrString, logDir);
}
/**
@@ -448,8 +467,9 @@ public class JwtHttpTest {
String jwksHttpUrl = "https://localhost:25010/www/temp_jwks.json";
String impaladJwtArgs = String.format("--jwt_token_auth=true "
+ "--jwt_validate_signature=true --jwks_url=%s "
- + "--jwt_allow_without_tls=true --log_dir=%s --jwks_ca_certificate=%s ",
- jwksHttpUrl, logDir.toAbsolutePath(), Paths.get(certDir, CA_CERT));
+ + "--jwt_allow_without_tls=true --log_dir=%s --jwks_ca_certificate=%s "
+ + "--logbuflevel=-1 ", jwksHttpUrl, logDir.toAbsolutePath(),
+ Paths.get(certDir, CA_CERT));
String expectedErrString = String.format("Impalad services did not start correctly, "
+ "exiting. Error: Error downloading JWKS from '%s': Network error: curl "
+ "error: SSL peer certificate or SSH remote key was not OK: SSL: "
@@ -458,13 +478,9 @@ public class JwtHttpTest {
// cluster start will fail because the TLS cert returned by the
// JWKS server is not trusted
- setUp(impaladJwtArgs, "", statestoreWebserverArgs, 1);
+ setUpWithSingleCoordinator(impaladJwtArgs, "", statestoreWebserverArgs, 1);
- // check in the impalad logs that the server startup failed for the expected reason
- List<String> logLines = Files.readAllLines(logDir.resolve("impalad.ERROR"));
- assertThat(String.format("Impalad startup failed but not for the expected reason. "
- + "See logs in the '%s' folder for details.", logDir), logLines,
- hasItem(containsString(expectedErrString)));
+ checkCoordinatorLogs(expectedErrString, logDir);
}
/**
@@ -524,4 +540,35 @@ public class JwtHttpTest {
return certDir.toString();
}
+
+ /**
+ * Asserts that the specified string is present in the impalad.ERROR file within the
+ * specified log directory.
+ *
+ * @param expectedString The impalad.ERROR file is searched for this string. If it is
+ * not found, the test fails.
+ * @param logDir Location of the directory where log files are stored.
+ */
+ private void checkCoordinatorLogs(String expectedString, Path logDir)
+ throws IOException, InterruptedException {
+ // check in the impalad logs that the server startup failed for the expected reason
+ List<String> logLines = null;
+ Matcher<Iterable<? super String>> m = hasItem(containsString(expectedString));
+
+ // writing logs to disk may take some time, try a few times to search for the
+ // expected error in the log
+ for (int i=0; i<10; i++) {
+ logLines = Files.readAllLines(logDir.resolve("impalad.ERROR"));
+ if (m.matches(logLines)) {
+ break;
+ }
+ Thread.sleep(250);
+ }
+
+ // runs the matcher one more time to ensure a descriptive failure message is
+ // generated if the assert fails
+ assertThat(String.format("Impalad startup failed but not for the expected reason. "
+ + "See logs in the '%s' folder for details.", logDir), logLines,
+ hasItem(containsString(expectedString)));
+ }
}