You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by st...@apache.org on 2021/04/02 07:54:58 UTC
[impala] 02/05: IMPALA-10259 (part 2): Fixed DCHECK error for
backend in terminal state
This is an automated email from the ASF dual-hosted git repository.
stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
commit d621c68136b4a47388f579de5e3fe9fd3372bd68
Author: wzhou-code <wz...@cloudera.com>
AuthorDate: Wed Mar 31 20:42:05 2021 -0700
IMPALA-10259 (part 2): Fixed DCHECK error for backend in terminal state
The previous patch tried to fix the race and make backends avoid to
send status report with fragment instance state as "done" and
overall_status as OK after fragment instance fails. But it does not
work when fragment instance state is updated during generating status
report.
For failed fragment instance, backend should report the instance as
"done" only when overall_statue is reported with error. The final
fragment instance state will be reported in final status report.
This avoid coordinator to ignore the last status report.
Testing:
- Manual tests
I could only reproduce the situation by adding some artificial
delays in the QueryState::ConstructReport() after setting
overall_status for the status report when repeatedly running
test case test_spilling.py::TestSpillingNoDebugActionDimensions
::test_spilling_no_debug_action. Verified that the issue did
not happen after applying this patch.
- Passed exhaustive test.
Change-Id: Ifd9820f9944a78811ee7acfa5870a9418902b17b
Reviewed-on: http://gerrit.cloudera.org:8080/17258
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
be/src/runtime/fragment-instance-state.cc | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/be/src/runtime/fragment-instance-state.cc b/be/src/runtime/fragment-instance-state.cc
index ca7ab1c..df546c1 100644
--- a/be/src/runtime/fragment-instance-state.cc
+++ b/be/src/runtime/fragment-instance-state.cc
@@ -113,6 +113,7 @@ done:
current_state_.Load() > FInstanceExecStatePB::WAITING_FOR_PREPARE);
if (!status.ok()) {
+ exec_failed_.Store(true);
if (!is_prepared) {
UpdateState(StateEvent::EXEC_END);
@@ -124,7 +125,6 @@ done:
// with GetStatusReport(). This may lead to report the instance as "completed"
// without error, or the "final" profile being sent with the 'done' flag as false.
- exec_failed_.Store(true);
// Tell the managing 'QueryState' that we hit an error during execution.
query_state_->ErrorDuringExecute(status, instance_id());
@@ -284,7 +284,9 @@ void FragmentInstanceState::GetStatusReport(FragmentInstanceExecStatusPB* instan
}
const TUniqueId& finstance_id = instance_id();
TUniqueIdToUniqueIdPB(finstance_id, instance_status->mutable_fragment_instance_id());
- const bool done = IsDone() || (ExecFailed() && !overall_status.ok());
+ // For failed fragment instance, report it as "done" when overall_statue is reported
+ // with error. This avoid coordinator to ignore the last status report.
+ const bool done = (IsDone() && !ExecFailed()) || (ExecFailed() && !overall_status.ok());
instance_status->set_done(done);
instance_status->set_current_state(current_state());
DCHECK(profile() != nullptr);