You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Michael Ho (Code Review)" <ge...@cloudera.org> on 2019/04/02 01:09:07 UTC

[Impala-ASF-CR] IMPALA-2990: timeout unresponsive queries in coordinator

Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12299 )

Change subject: IMPALA-2990: timeout unresponsive queries in coordinator
......................................................................


Patch Set 7:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/runtime/coordinator-backend-state.cc@174
PS7, Line 174:   const auto trigger = MakeScopeExitTrigger(
             :       [this]() { last_report_time_ms_ = GenerateReportTimestamp(); });
Does it implicitly rely on the LIFO order for local variables' destruction in C++ so "notifier" won't be destroyed before "trigger" ? Otherwise, we could have notified 'exec_complete_barrier' without setting 'last_report_time_ms'. 

I may be paranoid but can you please add a comment about this so future changes won't accidentally swap their orders ? I suppose we can skip the use of NotifyBarrierOnExit here and just notify 'exec_complete_barrier' in 'trigger' after setting 'last_report_tims_ms_'.


http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/runtime/coordinator.cc@549
PS7, Line 549: DCHECK(
DCHECK_LE


http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/runtime/coordinator.cc@664
PS7, Line 664: DCHECK(
DCHECK_LE


http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/service/impala-server.cc@398
PS4, Line 398: 
> k. We could deprecate it in a separate patch I suppose. Fine with this as i
As Thomas said, this config option was useful when users faced instability with large Impala cluster due to IMPALA-2990 (which this patch is fixing). We can consider deprecating this flag once we are convinced that this patch makes enough improvement that the instability with large cluster is mostly mitigated.

The periodic reporting matters for long running queries. The default reporting interval of 5 seconds could be longer than most of the short running query's duration. For long running queries, it probably doesn't really matter to get a report every 5 seconds vs say 30s or 60s. If we ever deprecate this flag, please consider bumping the the default reporting interval to something larger.


http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/service/impala-server.cc@2288
PS4, Line 2288:   return Status::OK();
              : }
              : 
              : void ImpalaServer::ExpireQuery(ClientRequestState* crs, const Status& status) {
              :   DCHECK(!status.ok());
              :   cancellation_thread_pool_->Offer(
> k, would be interested what Michael thinks about that idea
The executor's reporting path currently can tolerate up to FLAGS_status_report_max_retries number of failures before giving up and declaring the coordinator as inaccessible. It's intentionally not a time-based approach and instead relies on the periodic reporting to handle the retry.

Stepping back a bit, my understanding for why we want to approximate the maximum retry time is that the coordinator may not want to prematurely cancel a query due to a missing report from an executor if it knows that the executor *may* still be retrying to send a report. However, as pointed out elsewhere by Todd, if an Impala can experience random pauses or random scheduling delay for the reporting thread, no amount of wait time will guarantee that an executor is not in the middle of retrying to send a report when the query is cancelled. I suppose that's not an end-goal we try to achieve.

This prompts me to think whether it's any different if we just do the following:
- expose a coordinator's maximum tolerable lag time as a knob set to arbitrary number
- expose an executor's maximum tolerable retry time as a knob set to some arbitrary number
- deprecate FLAGS_status_report_max_retries 

An alternative would be to:
- expose an executor's maximum tolerable retry time as a knob set to some arbitrary number
- deprecate FLAGS_status_report_max_retries 
- set coordinator's maximum tolerable lag time as (100 + x)% of the maximum tolerable retry time where x *may be* tunable in case the default value causes any trouble.

Todd / Thomas, what do you think ?


http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/service/impala-server.cc@218
PS7, Line 218: DEFINE_int32(status_report_max_retries, 3,
> Is our timeout too aggressive here? If one of the backends goes into some k
I agree that the coordinator should be more lenient in handling lost reports to avoid false positive. On the other hand, no amount of timeout can save us from random pauses in Impalad so we should work on fixing IMPALA-2800 if possible at all.



-- 
To view, visit http://gerrit.cloudera.org:8080/12299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987
Gerrit-Change-Number: 12299
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 02 Apr 2019 01:09:07 +0000
Gerrit-HasComments: Yes