You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ta...@apache.org on 2019/11/26 06:49:44 UTC
[impala] 03/04: IMPALA-8867: Further deflake test_auto_scaling

This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit ddd07e17136302aee58c44060c0f93876d00294c
Author: Lars Volker <lv...@cloudera.com>
AuthorDate: Mon Nov 18 16:51:25 2019 -0800

    IMPALA-8867: Further deflake test_auto_scaling
    
    A previous attempt to deflake this test by lowering the threshold for
    multi-group throughput had improved things but we still saw another
    occurrence of test_auto_scaling failing recently. This change lowers the
    threshold even further to try and eradicate the flakiness. From
    inspecting the logs of the failed run I could see that the new threshold
    would have prevented the failure.
    
    Change-Id: I29808982cc6226152c544cb99f76961b582975a7
    Reviewed-on: http://gerrit.cloudera.org:8080/14740
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Lars Volker <lv...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 tests/custom_cluster/test_auto_scaling.py | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tests/custom_cluster/test_auto_scaling.py b/tests/custom_cluster/test_auto_scaling.py
index deab85a..6f999b0 100644
--- a/tests/custom_cluster/test_auto_scaling.py
+++ b/tests/custom_cluster/test_auto_scaling.py
@@ -99,8 +99,10 @@ class TestAutoScaling(CustomClusterTestSuite):
       assert self.impalad_test_service.get_metric_value(
         "cluster-membership.executor-groups.total-healthy") >= 2
 
-      # Wait for query rate to reach the maximum for a single executor group plus 20%
-      min_query_rate = 1.2 * EXECUTOR_SLOTS
+      # Wait for query rate to exceed the maximum for a single executor group. In the past
+      # we tried to wait for it to pass a higher threshold but on some platforms we saw
+      # that it was too flaky.
+      min_query_rate = EXECUTOR_SLOTS
       max_query_rate = 0
       # This barrier has been flaky in the past so we wait 2x as long as for the other
       # checks.
@@ -109,7 +111,7 @@ class TestAutoScaling(CustomClusterTestSuite):
         current_rate = workload.get_query_rate()
         LOG.info("Current rate: %s" % current_rate)
         max_query_rate = max(max_query_rate, current_rate)
-        if max_query_rate >= min_query_rate:
+        if max_query_rate > min_query_rate:
           break
         sleep(1)