You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@impala.apache.org by jo...@apache.org on 2022/05/23 22:43:17 UTC

[impala] branch master updated (09a297a27 -> 861d63f74)

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


    from 09a297a27 IMPALA-10503: Use larger yarn containers for dataload
     new 38b402fc8 [tools] Add Kerberos gen files to shell/.gitignore
     new fed0c6b32 IMPALA-11183: Fix run-all-tests.sh can't repeat tests more than once
     new 33724d623 IMPALA-11311: Fixed debug_noopt build directory
     new cf5eaae17 IMPALA-11305: Fix TypeError in impala-shell summary progress
     new 861d63f74 IMPALA-7864: (Addendum) Deflake test_replan_limit by postponing catalog fetches

The 5 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/CMakeLists.txt                          |  1 +
 be/src/exec/catalog-op-executor.cc         |  5 +++++
 bin/run-all-tests.sh                       | 29 ++++++++++++++++++++++-------
 buildall.sh                                |  3 ++-
 shell/.gitignore                           |  1 +
 shell/impala_shell.py                      |  2 +-
 tests/custom_cluster/test_local_catalog.py | 12 ++++++++----
 tests/run-custom-cluster-tests.sh          |  7 ++++++-
 tests/run-tests.py                         |  2 +-
 9 files changed, 47 insertions(+), 15 deletions(-)

[impala] 03/05: IMPALA-11311: Fixed debug_noopt build directory

Posted by jo...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 33724d623f29132721eef81fa296de01bc516e94
Author: Gergely Fürnstáhl <gf...@cloudera.com>
AuthorDate: Mon May 23 14:32:42 2022 +0200

    IMPALA-11311: Fixed debug_noopt build directory
    
    It used "release" by default, changed it to debug.
    
    Change-Id: I202065ca25ba622954ac11526e1c55db0f0e8a1c
    Reviewed-on: http://gerrit.cloudera.org:8080/18555
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/CMakeLists.txt | 1 +
 buildall.sh       | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/be/CMakeLists.txt b/be/CMakeLists.txt
index 51c709e35..66e8bea50 100644
--- a/be/CMakeLists.txt
+++ b/be/CMakeLists.txt
@@ -380,6 +380,7 @@ set(CMAKE_POSITION_INDEPENDENT_CODE ON)
 
 # set compile output directory
 if ("${CMAKE_BUILD_TYPE}" STREQUAL "DEBUG" OR
+    "${CMAKE_BUILD_TYPE}" STREQUAL "DEBUG_NOOPT" OR
     "${CMAKE_BUILD_TYPE}" STREQUAL "ADDRESS_SANITIZER" OR
     "${CMAKE_BUILD_TYPE}" STREQUAL "UBSAN" OR
     "${CMAKE_BUILD_TYPE}" STREQUAL "UBSAN_FULL" OR
diff --git a/buildall.sh b/buildall.sh
index dccf52c90..b6d2da415 100755
--- a/buildall.sh
+++ b/buildall.sh
@@ -87,7 +87,8 @@ export MAKE_CMD=make
 
 # parse command line options
 # Note: if you add a new build type, please also add it to 'VALID_BUILD_TYPES' in
-# tests/common/environ.py.
+# tests/common/environ.py and set correct BUILD_OUTPUT_ROOT_DIRECTORY directory in
+# be/CMakeLists.txt.
 while [ -n "$*" ]
 do
   case "$1" in

[impala] 01/05: [tools] Add Kerberos gen files to shell/.gitignore

Posted by jo...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 38b402fc81593f10003955efc6e84adb4266066c
Author: Michael Smith <mi...@cloudera.com>
AuthorDate: Fri May 20 14:49:49 2022 -0700

    [tools] Add Kerberos gen files to shell/.gitignore
    
    Ignore Python Kerberos module's generated files. Only dist needed to be
    ignored because build contains only a .so (ignored by root .gitignore)
    and egg-info is committed.
    
    Change-Id: If09d018cef130455a01b06a43fe612d66eead660
    Reviewed-on: http://gerrit.cloudera.org:8080/18548
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 shell/.gitignore | 1 +
 1 file changed, 1 insertion(+)

diff --git a/shell/.gitignore b/shell/.gitignore
index b7bb32270..021f9f416 100644
--- a/shell/.gitignore
+++ b/shell/.gitignore
@@ -3,6 +3,7 @@
 ext-py/bitarray-2.3.0/bitarray.egg-info/
 ext-py/bitarray-2.3.0/dist/
 ext-py/bitarray-2.3.0/build/
+ext-py/kerberos-1.3.1/dist/
 ext-py/prettytable-0.7.2/dist/
 ext-py/prettytable-0.7.2/build/
 ext-py/prettytable-0.7.2/prettytable.egg-info

[impala] 04/05: IMPALA-11305: Fix TypeError in impala-shell summary progress

Posted by jo...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit cf5eaae17634edd99f40bd40e83ea1f1248e4bf1
Author: Riza Suminto <ri...@cloudera.com>
AuthorDate: Thu May 19 21:12:32 2022 -0700

    IMPALA-11305: Fix TypeError in impala-shell summary progress
    
    impala-shell fail with TypeError when installed with python3. This is
    due to behavior change of division operator ('/') between python2 vs
    python3. This patch fix the issue by changing the operator with floor
    division ('//') that result in integer type as described in
    https://peps.python.org/pep-0238/.
    
    Testing:
    - Manually install impala-shell with from pip with python3 and verify
      the fix works.
    
    Change-Id: Ifbe4df6a7a4136e590f383fc6475e2283e35eadc
    Reviewed-on: http://gerrit.cloudera.org:8080/18546
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Michael Smith <mi...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 shell/impala_shell.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/shell/impala_shell.py b/shell/impala_shell.py
index a2bd4fb2d..50ad5eaa1 100755
--- a/shell/impala_shell.py
+++ b/shell/impala_shell.py
@@ -1245,7 +1245,7 @@ class ImpalaShell(cmd.Cmd, object):
           return
 
         if self.live_progress and progress.total_scan_ranges > 0:
-          val = ((summary.progress.num_completed_scan_ranges * 100) /
+          val = ((summary.progress.num_completed_scan_ranges * 100) //
                  summary.progress.total_scan_ranges)
           fragment_text = "[%s%s] %s%%\n" % ("#" * val, " " * (100 - val), val)
           data += fragment_text

[impala] 02/05: IMPALA-11183: Fix run-all-tests.sh can't repeat tests more than once

Posted by jo...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit fed0c6b321e835f768d947e0abb9f8051486fb84
Author: stiga-huang <hu...@gmail.com>
AuthorDate: Tue Mar 15 20:13:24 2022 +0800

    IMPALA-11183: Fix run-all-tests.sh can't repeat tests more than once
    
    We launch a background process checking whether tests are timeout in
    run-all-tests.sh. When NUM_TEST_ITERATIONS is set to larger than 1,
    run-all-tests.sh will repeat the tests. However, the timeout process is
    killed at the end of each iteration, which fails the script when we want
    to repeat tests. This patch moves the killing logic outside the loop.
    
    This patch also adds a new variable, CLUSTER_TEST_FILES, to specify
    a particular custom-cluster test to run.
    
    To speedup the test iteration, this patch avoids always restarting the
    Impala cluster. E.g. when we just need to run a particular EE test, we
    only need to start the Impala cluster once.
    
    Tested with NUM_TEST_ITERATIONS=10 and verified with following
    scenarios.
    
    1) custom-cluster test only
    export BE_TEST, FE_TEST, JDBC_TEST, EE_TEST to false
    export CLUSTER_TEST=true and CLUSTER_TEST_FILES to following values:
    custom_cluster/test_local_catalog.py
    custom_cluster/test_local_catalog.py::TestLocalCatalogRetries
    custom_cluster/test_local_catalog.py::TestLocalCatalogRetries::test_replan_limit
    "custom_cluster/test_local_catalog.py -k replan_limit"
    
    2) e2e test only
    export BE_TEST, FE_TEST, JDBC_TEST, CLUSTER_TEST to false
    export EE_TEST=true and
    EE_TEST_FILES=query_test/test_scanners.py::TestParquet::test_multiple_blocks_mt_dop
    
    Change-Id: I2bdd8a9c68ffb0dd1c3ea72c3649b00abcc05a49
    Reviewed-on: http://gerrit.cloudera.org:8080/18328
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 bin/run-all-tests.sh              | 29 ++++++++++++++++++++++-------
 tests/run-custom-cluster-tests.sh |  7 ++++++-
 tests/run-tests.py                |  2 +-
 3 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/bin/run-all-tests.sh b/bin/run-all-tests.sh
index 329b866a8..9f802ba49 100755
--- a/bin/run-all-tests.sh
+++ b/bin/run-all-tests.sh
@@ -51,6 +51,7 @@ fi
 : ${JDBC_TEST:=true}
 # Run Cluster Tests
 : ${CLUSTER_TEST:=true}
+: ${CLUSTER_TEST_FILES:=}
 # Extra arguments passed to start-impala-cluster for tests. These do not apply to custom
 # cluster tests.
 : ${TEST_START_CLUSTER_ARGS:=}
@@ -192,12 +193,22 @@ run_ee_tests() {
 
 for i in $(seq 1 $NUM_TEST_ITERATIONS)
 do
+  echo "Test iteration $i"
   TEST_RET_CODE=0
 
   # Store a list of the files at the beginning of each iteration.
   hdfs dfs -ls -R /test-warehouse > ${IMPALA_LOGS_DIR}/file-list-begin-${i}.log 2>&1
 
-  start_impala_cluster
+  # Try not restarting the cluster to save time. BE, FE, JDBC and EE tests require
+  # running on a cluster with default flags. We just need to restart the cluster when
+  # there are custom-cluster tests which will leave the cluster running with specifit
+  # flags.
+  if [[ "$BE_TEST" == true || "$FE_TEST" == true || "$EE_TEST" == true
+      || "$JDBC_TEST" == true ]]; then
+    if [[ $i == 1 || "$CLUSTER_TEST" == true ]]; then
+      start_impala_cluster
+    fi
+  fi
 
   if [[ "$BE_TEST" == true ]]; then
     if [[ "$TARGET_FILESYSTEM" == "local" ]]; then
@@ -325,12 +336,16 @@ do
   # the list of files is from dataload.
   hdfs dfs -ls -R /test-warehouse > ${IMPALA_LOGS_DIR}/file-list-end-${i}.log 2>&1
 
-  # Finally, kill the spawned timeout process and its child sleep process.
-  # There may not be a sleep process, so ignore failure.
-  pkill -P $TIMEOUT_PID || true
-  kill $TIMEOUT_PID
-
   if [[ $TEST_RET_CODE == 1 ]]; then
-    exit $TEST_RET_CODE
+    break
   fi
 done
+
+# Finally, kill the spawned timeout process and its child sleep process.
+# There may not be a sleep process, so ignore failure.
+pkill -P $TIMEOUT_PID || true
+kill $TIMEOUT_PID
+
+if [[ $TEST_RET_CODE == 1 ]]; then
+  exit $TEST_RET_CODE
+fi
diff --git a/tests/run-custom-cluster-tests.sh b/tests/run-custom-cluster-tests.sh
index 6b77e262f..1484bb6a7 100755
--- a/tests/run-custom-cluster-tests.sh
+++ b/tests/run-custom-cluster-tests.sh
@@ -36,8 +36,13 @@ mkdir -p "${RESULTS_DIR}"
 cd "${IMPALA_HOME}/tests"
 . "${IMPALA_HOME}/bin/set-classpath.sh" &> /dev/null
 
+: ${CLUSTER_TEST_FILES:=}
+if [[ "$CLUSTER_TEST_FILES" != "" ]]; then
+  ARGS=($CLUSTER_TEST_FILES)
+else
+  ARGS=(custom_cluster/ authorization/)
+fi
 AUX_CUSTOM_DIR="${IMPALA_AUX_TEST_HOME}/tests/aux_custom_cluster_tests/"
-ARGS=(custom_cluster/ authorization/)
 if [[ -d "${AUX_CUSTOM_DIR}" ]]
 then
   ARGS+=("${AUX_CUSTOM_DIR}")
diff --git a/tests/run-tests.py b/tests/run-tests.py
index 8168cd135..0ddabff4b 100755
--- a/tests/run-tests.py
+++ b/tests/run-tests.py
@@ -47,7 +47,7 @@ VALID_TEST_DIRS = ['failure', 'query_test', 'stress', 'unittests', 'aux_query_te
 TEST_HELPER_DIRS = ['aux_parquet_data_load', 'comparison', 'benchmark',
                      'custom_cluster', 'util', 'experiments', 'verifiers', 'common',
                      'performance', 'beeswax', 'aux_custom_cluster_tests',
-                     'authorization']
+                     'authorization', 'test-hive-udfs']
 
 TEST_DIR = os.path.join(os.environ['IMPALA_HOME'], 'tests')
 RESULT_DIR = os.path.join(os.environ['IMPALA_EE_TEST_LOGS_DIR'], 'results')

[impala] 05/05: IMPALA-7864: (Addendum) Deflake test_replan_limit by postponing catalog fetches

Posted by jo...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 861d63f74823b8165cd87028875f4429e59dedaf
Author: stiga-huang <hu...@gmail.com>
AuthorDate: Wed May 18 16:51:37 2022 +0800

    IMPALA-7864: (Addendum) Deflake test_replan_limit by postponing catalog fetches
    
    TestLocalCatalogRetries.test_replan_limit runs REFRESH and SELECT
    queries concurrently on a table, and expects one of the query hits
    inconsistent metadata.
    
    This patch increases the chance of inconsistent metadata by injecting
    a latency (500ms) before each catalog fetch. So it's more likely that a
    request is fetching stale metadata. Also bump up the timeout of
    thread.join() so we can try out all the attempts.
    
    Test
     - Run test_replan_limit 1000 times without any error.
     - Run all tests of TestLocalCatalogRetries 100 times without any error.
    
    Change-Id: Ia5bdca7402039f1f24b7bf19595c2541fa32d0ad
    Reviewed-on: http://gerrit.cloudera.org:8080/18537
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/exec/catalog-op-executor.cc         |  5 +++++
 tests/custom_cluster/test_local_catalog.py | 12 ++++++++----
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/be/src/exec/catalog-op-executor.cc b/be/src/exec/catalog-op-executor.cc
index 646e6aa52..c6c245428 100644
--- a/be/src/exec/catalog-op-executor.cc
+++ b/be/src/exec/catalog-op-executor.cc
@@ -55,6 +55,8 @@ DECLARE_int32(catalog_client_connection_num_retries);
 DECLARE_int32(catalog_client_rpc_timeout_ms);
 DECLARE_int32(catalog_client_rpc_retry_interval_ms);
 
+DEFINE_int32_hidden(inject_latency_before_catalog_fetch_ms, 0,
+    "Latency (ms) to be injected before fetching catalog data from the catalogd");
 DEFINE_int32_hidden(inject_latency_after_catalog_fetch_ms, 0,
     "Latency (ms) to be injected after fetching catalog data from the catalogd");
 
@@ -366,6 +368,9 @@ Status CatalogOpExecutor::GetPartialCatalogObject(
   DCHECK(FLAGS_use_local_catalog || TestInfo::is_test());
   const TNetworkAddress& address =
       MakeNetworkAddress(FLAGS_catalog_service_host, FLAGS_catalog_service_port);
+  if (FLAGS_inject_latency_before_catalog_fetch_ms > 0) {
+    SleepForMs(FLAGS_inject_latency_before_catalog_fetch_ms);
+  }
   int attempt = 0; // Used for debug action only.
   CatalogServiceConnection::RpcStatus rpc_status =
       CatalogServiceConnection::DoRpcWithRetry(env_->catalogd_client_cache(), address,
diff --git a/tests/custom_cluster/test_local_catalog.py b/tests/custom_cluster/test_local_catalog.py
index 63b0cbb91..6e74a4de0 100644
--- a/tests/custom_cluster/test_local_catalog.py
+++ b/tests/custom_cluster/test_local_catalog.py
@@ -273,8 +273,9 @@ class TestLocalCatalogRetries(CustomClusterTestSuite):
           q = random.choice(queries)
           attempt += 1
           try:
+            print 'Attempt', attempt, 'client', str(client)
             ret = self.execute_query_unchecked(client, q)
-          except Exception, e:
+          except Exception as e:
             if 'InconsistentMetadataFetchException' in str(e):
               with inconsistent_seen_lock:
                 inconsistent_seen[0] += 1
@@ -287,7 +288,8 @@ class TestLocalCatalogRetries(CustomClusterTestSuite):
         t.start()
       for t in threads:
         # When there are failures, they're observed quickly.
-        t.join(30)
+        # 600s is enough for 200 attempts.
+        t.join(600)
 
       assert failed_queries.empty(),\
           "Failed query count non zero: %s" % list(failed_queries.queue)
@@ -318,7 +320,8 @@ class TestLocalCatalogRetries(CustomClusterTestSuite):
 
   @pytest.mark.execute_serially
   @CustomClusterTestSuite.with_args(
-      impalad_args="--use_local_catalog=true --local_catalog_max_fetch_retries=0",
+      impalad_args="--use_local_catalog=true --local_catalog_max_fetch_retries=0"
+                   " --inject_latency_before_catalog_fetch_ms=500",
       catalogd_args="--catalog_topic_mode=minimal")
   def test_replan_limit(self):
     """
@@ -326,7 +329,8 @@ class TestLocalCatalogRetries(CustomClusterTestSuite):
     an inconsistent metadata exception when running concurrent reads/writes
     is seen. With the max retries set to 0, no retries are expected and with
     the concurrent read/write workload, an inconsistent metadata exception is
-    expected.
+    expected. Setting inject_latency_before_catalog_fetch_ms to increases the
+    possibility of a stale request which throws the expected exception.
     """
     queries = [
       'refresh functional.alltypes',