You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Thomas Marshall (Code Review)" <ge...@cloudera.org> on 2019/03/15 21:22:38 UTC

[Impala-ASF-CR] IMPALA-2990: timeout unresponsive queries in coordinator

Hello Michael Ho, Philip Zeyliger, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12299

to look at the new patch set (#3).

Change subject: IMPALA-2990: timeout unresponsive queries in coordinator
......................................................................

IMPALA-2990: timeout unresponsive queries in coordinator

The coordinator currently waits indefinitely if it does not receive a
status report from a backend. This could cause a query to hang
indefinitely in certain situations, for example if the backend decides
to cancel itself as a result of failed status report rpcs.

This patch adds a thread to ImpalaServer which periodically iterates
over all queries for which that server is the coordinator and cancels
any that haven't had a report from a backend in a certain amount of
time.

The timeout is calculated as the longest a backend will attempt to
retry sending status reports before giving up and cancelling itself.
With the default flags, this timeout is about 15 minutes.

It introduces one new flag:
--hung_backend_check_interval_s: the frequency that the thread will
  wake up to do the checking

TODO:
- Run real cluster tests to determine appropriate default values for
  the flags and how scalable this approach is (eg. should we use a
  thread pool instead of a single thread?)
- Write functional tests once the appropriate mechanisms are in place
  to simulate errors (IMPALA-8138)

Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M common/thrift/generate_error_codes.py
M tests/run-tests.py
10 files changed, 126 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/12299/3
-- 
To view, visit http://gerrit.cloudera.org:8080/12299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987
Gerrit-Change-Number: 12299
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tm...@cloudera.com>