You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by bo...@apache.org on 2022/03/08 12:53:18 UTC
[impala] 02/02: IMPALA-10999 Flakiness in TestAsyncLoadData.test_async_load

This is an automated email from the ASF dual-hosted git repository.

boroknagyz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 2d6300ac59918d6e8a45a5d4669e252f177d6fcc
Author: Qifan Chen <qc...@cloudera.com>
AuthorDate: Wed Feb 23 17:23:55 2022 -0500

    IMPALA-10999 Flakiness in TestAsyncLoadData.test_async_load
    
    This patch addresses the flakness by allowing RUNNING state to be a
    legit exec state returned from execute_query_async_using_client() in
    python. This call submits the load query to Impala backend.
    
    The corresponding Impala backend code for the beewax protocol is
    ImpalaServer::query() which utilizes a wait thread executing
    ClientRequestState::Wait() to set the exec state from RUNNING to
    FINISH. Sometime, when this wait thread does not run fast to do so,
    the returned state from the main thread is RUNNING.
    
    The fix is purely a modification to the test itself.
    
    Testing:
      1. ran core test successfully
    
    Change-Id: Ic2ac954b0494b7413ce0ec405718fcc354dba9e0
    Reviewed-on: http://gerrit.cloudera.org:8080/18268
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 tests/metadata/test_load.py | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/tests/metadata/test_load.py b/tests/metadata/test_load.py
index bf2aa77..d2ec060 100644
--- a/tests/metadata/test_load.py
+++ b/tests/metadata/test_load.py
@@ -128,6 +128,9 @@ class TestAsyncLoadData(ImpalaTestSuite):
     cls.ImpalaTestMatrix.add_dimension(create_exec_option_dimension(
         disable_codegen_options=[False]))
 
+  # This test subjects the load into either sync or async compilation of the load
+  # query at the backend through beewax or hs2 clients. The objective is to assure
+  # the load query completes successfully.
   def test_async_load(self, vector, unique_database):
     enable_async_load_data = vector.get_value('enable_async_load_data_execution')
     protocol = vector.get_value('protocol')
@@ -182,19 +185,21 @@ class TestAsyncLoadData(ImpalaTestSuite):
       wait_end = time.time()
       wait_time = wait_end - wait_start
       self.close_query_using_client(client, handle)
-      # In sync mode:
-      #  The entire LOAD is processed in the exec step with delay. exec_time should be
-      #  more than 3 seconds.
-      #
-      # In async mode:
-      #  The compilation of LOAD is processed in the exec step without delay. And the
-      #  processing of the LOAD plan is in wait step with delay. The wait time should
-      #  definitely take more time than 3 seconds.
       if enable_async_load_data:
+        # In async mode:
+        #  The compilation of LOAD is processed in the exec step without delay. And the
+        #  processing of the LOAD plan is in wait step with delay. The wait time should
+        #  definitely take more time than 3 seconds.
         assert(exec_end_state == running_state)
         assert(wait_time >= 3)
       else:
-        assert(exec_end_state == finished_state)
+        # In sync mode:
+        #  The entire LOAD is processed in the exec step with delay. exec_time should be
+        #  more than 3 seconds. Since the load query is submitted async, it is possible
+        #  that the exec state returned is still in RUNNING state due to the the wait-for
+        #  thread executing ClientRequestState::Wait() does not have time to set the
+        #  exec state from RUNNING to FINISH.
+        assert(exec_end_state == running_state or exec_end_state == finished_state)
         assert(exec_time >= 3)
     finally:
       client.close()