You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by st...@apache.org on 2021/04/02 07:54:58 UTC

[impala] 02/05: IMPALA-10259 (part 2): Fixed DCHECK error for backend in terminal state

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit d621c68136b4a47388f579de5e3fe9fd3372bd68
Author: wzhou-code <wz...@cloudera.com>
AuthorDate: Wed Mar 31 20:42:05 2021 -0700

    IMPALA-10259 (part 2): Fixed DCHECK error for backend in terminal state
    
    The previous patch tried to fix the race and make backends avoid to
    send status report with fragment instance state as "done" and
    overall_status as OK after fragment instance fails. But it does not
    work when fragment instance state is updated during generating status
    report.
    For failed fragment instance, backend should report the instance as
    "done" only when overall_statue is reported with error. The final
    fragment instance state will be reported in final status report.
    This avoid coordinator to ignore the last status report.
    
    Testing:
     - Manual tests
       I could only reproduce the situation by adding some artificial
       delays in the QueryState::ConstructReport() after setting
       overall_status for the status report when repeatedly running
       test case test_spilling.py::TestSpillingNoDebugActionDimensions
       ::test_spilling_no_debug_action. Verified that the issue did
       not happen after applying this patch.
     - Passed exhaustive test.
    
    Change-Id: Ifd9820f9944a78811ee7acfa5870a9418902b17b
    Reviewed-on: http://gerrit.cloudera.org:8080/17258
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/runtime/fragment-instance-state.cc | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/be/src/runtime/fragment-instance-state.cc b/be/src/runtime/fragment-instance-state.cc
index ca7ab1c..df546c1 100644
--- a/be/src/runtime/fragment-instance-state.cc
+++ b/be/src/runtime/fragment-instance-state.cc
@@ -113,6 +113,7 @@ done:
       current_state_.Load() > FInstanceExecStatePB::WAITING_FOR_PREPARE);
 
   if (!status.ok()) {
+    exec_failed_.Store(true);
     if (!is_prepared) {
       UpdateState(StateEvent::EXEC_END);
 
@@ -124,7 +125,6 @@ done:
       // with GetStatusReport(). This may lead to report the instance as "completed"
       // without error, or the "final" profile being sent with the 'done' flag as false.
 
-      exec_failed_.Store(true);
       // Tell the managing 'QueryState' that we hit an error during execution.
       query_state_->ErrorDuringExecute(status, instance_id());
 
@@ -284,7 +284,9 @@ void FragmentInstanceState::GetStatusReport(FragmentInstanceExecStatusPB* instan
   }
   const TUniqueId& finstance_id = instance_id();
   TUniqueIdToUniqueIdPB(finstance_id, instance_status->mutable_fragment_instance_id());
-  const bool done = IsDone() || (ExecFailed() && !overall_status.ok());
+  // For failed fragment instance, report it as "done" when overall_statue is reported
+  // with error. This avoid coordinator to ignore the last status report.
+  const bool done = (IsDone() && !ExecFailed()) || (ExecFailed() && !overall_status.ok());
   instance_status->set_done(done);
   instance_status->set_current_state(current_state());
   DCHECK(profile() != nullptr);