You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by st...@apache.org on 2023/02/26 01:39:54 UTC

[impala] branch master updated (f54b3c375 -> d0592c0db)

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


    from f54b3c375 IMPALA-11713: Switch to C++17
     new fa64be7cc IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg
     new d0592c0db IMPALA-11945: Fix Flaky Test in JwtHttpTest

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 docs/topics/impala_iceberg.xml                     | 60 ++++++++++++++++
 .../apache/impala/customcluster/JwtHttpTest.java   | 83 +++++++++++++++++-----
 2 files changed, 125 insertions(+), 18 deletions(-)


[impala] 01/02: IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg

Posted by st...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit fa64be7cc7074f201fff1eccc9cbf19520a19c55
Author: Riza Suminto <ri...@cloudera.com>
AuthorDate: Thu Feb 23 16:05:31 2023 -0800

    IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg
    
    IMPALA-11658 implements Iceberg manifest caching for Impala. This patch
    adds documentation for configuring the cache(s).
    
    Testing:
    - Built docs locally
    
    Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe
    Reviewed-on: http://gerrit.cloudera.org:8080/19530
    Reviewed-by: Daniel Becker <da...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Zoltan Borok-Nagy <bo...@cloudera.com>
---
 docs/topics/impala_iceberg.xml | 60 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/docs/topics/impala_iceberg.xml b/docs/topics/impala_iceberg.xml
index 32363f3de..62abca615 100644
--- a/docs/topics/impala_iceberg.xml
+++ b/docs/topics/impala_iceberg.xml
@@ -606,4 +606,64 @@ ALTER TABLE ice_tbl EXECUTE expire_snapshots(now() - interval 5 days);
       </p>
     </conbody>
   </concept>
+
+  <concept id="iceberg_manifest_caching">
+    <title>Iceberg manifest caching</title>
+    <conbody>
+      <p>
+        Starting from version 1.1.0, Apache Iceberg provides a mechanism to cache the
+        contents of Iceberg manifest files in memory. This manifest caching feature helps
+        to reduce repeated reads of small Iceberg manifest files from remote storage by
+        Coordinators and Catalogd. This feature can be enabled for Impala Coordinators and
+        Catalogd by setting properties in Hadoop's core-site.xml as in the following:
+        <codeblock>
+iceberg.io-impl=org.apache.iceberg.hadoop.HadoopFileIO;
+iceberg.io.manifest.cache-enabled=true;
+iceberg.io.manifest.cache.max-total-bytes=104857600;
+iceberg.io.manifest.cache.expiration-interval-ms=3600000;
+iceberg.io.manifest.cache.max-content-length=8388608;
+        </codeblock>
+      </p>
+      <p>
+        The description of each property is as follows:
+        <ul>
+          <li>
+            <codeph>iceberg.io-impl</codeph>: custom FileIO implementation to use in a
+            catalog. Must be set to enable manifest caching. Impala defaults to
+            HadoopFileIO. It is recommended to not change this to other than HadoopFileIO.
+          </li>
+          <li>
+            <codeph>iceberg.io.manifest.cache-enabled</codeph>: enable/disable the
+            manifest caching feature.
+          </li>
+          <li>
+            <codeph>iceberg.io.manifest.cache.max-total-bytes</codeph>: maximum total
+            amount of bytes to cache in the manifest cache. Must be a positive value.
+          </li>
+          <li>
+            <codeph>iceberg.io.manifest.cache.expiration-interval-ms</codeph>: maximum
+            duration for which an entry stays in the manifest cache. Must be a
+            non-negative value. Setting zero means cache entries expire only if it gets
+            evicted due to memory pressure from
+            <codeph>iceberg.io.manifest.cache.max-total-bytes</codeph>.
+          </li>
+          <li>
+            <codeph>iceberg.io.manifest.cache.max-content-length</codeph>: maximum length
+            of a manifest file to be considered for caching in bytes. Manifest files with
+            a length exceeding this property value will not be cached. Must be set with a
+            positive value and lower than
+            <codeph>iceberg.io.manifest.cache.max-total-bytes</codeph>.
+          </li>
+        </ul>
+      </p>
+      <p>
+        Manifest caching only works for tables that are loaded with either of
+        HadoopCatalogs or HiveCatalogs. Individual HadoopCatalog and HiveCatalog will have
+        separate manifest caches with the same configuration. By default, only 8 catalogs
+        can have their manifest cache active in memory. This number can be raised by
+        setting a higher value in the java system property
+        <codeph>iceberg.io.manifest.cache.fileio-max</codeph>.
+      </p>
+    </conbody>
+  </concept>
 </concept>


[impala] 02/02: IMPALA-11945: Fix Flaky Test in JwtHttpTest

Posted by st...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit d0592c0dbf4d634cc26b2e9872a9f62af5249cdd
Author: jasonmfehr <jf...@cloudera.com>
AuthorDate: Fri Feb 24 12:54:03 2023 -0800

    IMPALA-11945: Fix Flaky Test in JwtHttpTest
    
    The JwtHttpTest.testJwtAuthWithUntrustedJwksHttpsUrl test has been
    flaky in the impala-asf-master-core Jenkins build. This build was
    executed twice with this test passing in one run and failing in the
    other.
    
    This test and another test named
    testJwtAuthWithTrustedJwksHttpsUrlInvalidCN both follow the same
    pattern. They attempt to start up a cluster with incorrectly
    configured Impala daemons. Then, the Impala coordinator's log file
    is searched to ensure the coordinator failed to start up for the
    expected reason.  Both these tests were introduced by IMPALA-11922.
    
    The theory for the test flakiness is that the Impala coordinator logs
    did not finish writing to disk before they were checked for the
    expected startup failure message. The evidence backing this theory
    is that all expected log messages were present in the log file except
    for the final log line containing the error message.
    
    Three changes were made to fix the flakiness
    1. The tests where log files are search for startup failure messages
       now use a single coordinator. Only a single coordinator is needed
       thus this change eliminates potential issues caused by multiple
       coordinators.
    2. The tests sleep 5 seconds before searching the log files to give
       time for the logs to be fully written to disk.
    3. Log buffering in the Impala daemons was turned off by setting the
       logbuflevel startup flag to -1 which turns off in-memory log
       buffering and writes the logs directly to disk.
    
    Change-Id: I26d8c2cffa7bc095b81eb572709f55b04ac1fa67
    Reviewed-on: http://gerrit.cloudera.org:8080/19536
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 .../apache/impala/customcluster/JwtHttpTest.java   | 83 +++++++++++++++++-----
 1 file changed, 65 insertions(+), 18 deletions(-)

diff --git a/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java b/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java
index 4a72c3e39..82fba0618 100644
--- a/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java
+++ b/fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java
@@ -36,6 +36,7 @@ import org.apache.hive.service.rpc.thrift.*;
 import org.apache.impala.testutil.WebClient;
 import org.apache.impala.testutil.X509CertChain;
 import org.apache.thrift.transport.THttpClient;
+import org.hamcrest.Matcher;
 import org.apache.thrift.protocol.TBinaryProtocol;
 import org.junit.After;
 import org.junit.Test;
@@ -51,6 +52,7 @@ public class JwtHttpTest {
   private static final String CA_CERT = "cacert.pem";
   private static final String SERVER_CERT = "server-cert.pem";
   private static final String SERVER_KEY = "server-key.pem";
+  private static final String JWKS_FILE_NAME = "jwks_rs256.json";
 
   WebClient client_ = new WebClient();
 
@@ -89,13 +91,34 @@ public class JwtHttpTest {
   private void setUp(String impaladArgs, String catalogdArgs,
       String statestoredArgs, int expectedRetCode)
       throws Exception {
-    if (createJWKSForWebServer_) createTempJWKSInWebServerRootDir("jwks_rs256.json");
+    if (createJWKSForWebServer_) createTempJWKSInWebServerRootDir(JWKS_FILE_NAME);
 
     int ret = CustomClusterRunner.StartImpalaCluster(
         impaladArgs, catalogdArgs, statestoredArgs);
     assertEquals(expectedRetCode, ret);
   }
 
+  /**
+   * Helper method to start a JWT auth enabled Impala cluster that has only a single
+   * coordinator daemon process.
+   *
+   * @param impaladArgs startup flags to send to the impala coordinator/executors
+   * @param catalogdArgs startup flags to send to the impala catalog
+   * @param statestoredArgs startup flags to send to the statestore
+   * @param expectedRetCode expected exit code for the start impala cluster command,
+   *                        if the cluster is expected to start successfully, set to 0
+   */
+  private void setUpWithSingleCoordinator(String impaladArgs, String catalogdArgs,
+      String statestoredArgs, int expectedRetCode)
+      throws Exception {
+    if (createJWKSForWebServer_) createTempJWKSInWebServerRootDir(JWKS_FILE_NAME);
+
+    int ret = CustomClusterRunner.StartImpalaCluster(
+      impaladArgs, catalogdArgs, statestoredArgs, new HashMap<String, String>(),
+      "--num_coordinators=1");
+    assertEquals(expectedRetCode, ret);
+  }
+
   @After
   public void cleanUp() throws Exception {
     // Leave a cluster running with the default configuration, then delete temporary
@@ -178,7 +201,8 @@ public class JwtHttpTest {
   public void testJwtAuth() throws Exception {
     createJWKSForWebServer_ = false;
     String jwksFilename =
-        new File(System.getenv("IMPALA_HOME"), "testdata/jwt/jwks_rs256.json").getPath();
+        new File(System.getenv("IMPALA_HOME"),
+        String.format("testdata/jwt/%s", JWKS_FILE_NAME)).getPath();
     setUp(String.format(
         "--jwt_token_auth=true --jwt_validate_signature=true --jwks_file_path=%s "
             + "--jwt_allow_without_tls=true",
@@ -404,7 +428,7 @@ public class JwtHttpTest {
     String jwksHttpUrl = "https://localhost:25010/www/temp_jwks.json";
     String impaladJwtArgs = String.format("--jwt_token_auth=true "
         + "--jwt_validate_signature=true --jwks_url=%s "
-        + "--jwt_allow_without_tls=true --log_dir=%s ",
+        + "--jwt_allow_without_tls=true --log_dir=%s --logbuflevel=-1 ",
         jwksHttpUrl, logDir.toAbsolutePath());
     String expectedErrString = String.format("Impalad services did not start correctly, "
         + "exiting.  Error: Error downloading JWKS from '%s': Network error: curl "
@@ -413,14 +437,9 @@ public class JwtHttpTest {
 
     // cluster start will fail because the TLS cert returned by the
     // JWKS server is not trusted
-    setUp(impaladJwtArgs, "", statestoreWebserverArgs, 1);
-
-    // check in the impalad logs that the server startup failed for the expected reason
-    List<String> logLines = Files.readAllLines(logDir.resolve("impalad.ERROR"));
+    setUpWithSingleCoordinator(impaladJwtArgs, "", statestoreWebserverArgs, 1);
 
-    assertThat(String.format("Impalad startup failed but not for the expected reason. "
-        + "See logs in the '%s' folder for details.", logDir), logLines,
-        hasItem(containsString(expectedErrString)));
+    checkCoordinatorLogs(expectedErrString, logDir);
   }
 
   /**
@@ -448,8 +467,9 @@ public class JwtHttpTest {
     String jwksHttpUrl = "https://localhost:25010/www/temp_jwks.json";
     String impaladJwtArgs = String.format("--jwt_token_auth=true "
         + "--jwt_validate_signature=true --jwks_url=%s "
-        + "--jwt_allow_without_tls=true --log_dir=%s --jwks_ca_certificate=%s ",
-        jwksHttpUrl, logDir.toAbsolutePath(), Paths.get(certDir, CA_CERT));
+        + "--jwt_allow_without_tls=true --log_dir=%s --jwks_ca_certificate=%s "
+        + "--logbuflevel=-1 ", jwksHttpUrl, logDir.toAbsolutePath(),
+        Paths.get(certDir, CA_CERT));
     String expectedErrString = String.format("Impalad services did not start correctly, "
         + "exiting.  Error: Error downloading JWKS from '%s': Network error: curl "
         + "error: SSL peer certificate or SSH remote key was not OK: SSL: "
@@ -458,13 +478,9 @@ public class JwtHttpTest {
 
     // cluster start will fail because the TLS cert returned by the
     // JWKS server is not trusted
-    setUp(impaladJwtArgs, "", statestoreWebserverArgs, 1);
+    setUpWithSingleCoordinator(impaladJwtArgs, "", statestoreWebserverArgs, 1);
 
-    // check in the impalad logs that the server startup failed for the expected reason
-    List<String> logLines = Files.readAllLines(logDir.resolve("impalad.ERROR"));
-    assertThat(String.format("Impalad startup failed but not for the expected reason. "
-        + "See logs in the '%s' folder for details.", logDir), logLines,
-        hasItem(containsString(expectedErrString)));
+    checkCoordinatorLogs(expectedErrString, logDir);
   }
 
   /**
@@ -524,4 +540,35 @@ public class JwtHttpTest {
 
     return certDir.toString();
   }
+
+  /**
+   * Asserts that the specified string is present in the impalad.ERROR file within the
+   * specified log directory.
+   *
+   * @param expectedString The impalad.ERROR file is searched for this string.  If it is
+   *                       not found, the test fails.
+   * @param logDir         Location of the directory where log files are stored.
+   */
+  private void checkCoordinatorLogs(String expectedString, Path logDir)
+      throws IOException, InterruptedException {
+    // check in the impalad logs that the server startup failed for the expected reason
+    List<String> logLines = null;
+    Matcher<Iterable<? super String>> m = hasItem(containsString(expectedString));
+
+    // writing logs to disk may take some time, try a few times to search for the
+    // expected error in the log
+    for (int i=0; i<10; i++) {
+      logLines = Files.readAllLines(logDir.resolve("impalad.ERROR"));
+      if (m.matches(logLines)) {
+        break;
+      }
+      Thread.sleep(250);
+    }
+
+    // runs the matcher one more time to ensure a descriptive failure message is
+    // generated if the assert fails
+    assertThat(String.format("Impalad startup failed but not for the expected reason. "
+        + "See logs in the '%s' folder for details.", logDir), logLines,
+        hasItem(containsString(expectedString)));
+  }
 }