You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tim Armstrong (Code Review)" <ge...@cloudera.org> on 2018/06/19 00:25:56 UTC

[Impala-ASF-CR] IMPALA-1760: Implement shutdown command

Tim Armstrong has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/10744 )

Change subject: IMPALA-1760: Implement shutdown command
......................................................................

IMPALA-1760: Implement shutdown command

This allows graceful shutdown of executors and partially graceful
shutdown of coordinators (new operations fail, old operations can
continue).

Details:
* In order to allow future admin commands, this is implemented with
  function-like syntax and does not add any reserved words.
* ALL privilege is required on the server
* The coordinator impalad that the client is connected to can be shut
  down directly with ":shutdown()".
* Remote shutdown of another impalad is supported, e.g. with
  ":shutdown('hostname')", so that non-coordinators can be shut down
  and for the convenience of the client, which does not have to
  connect to the specific impalad. There is no assumption that the
  other impalad is registered in the statestore; just that the
  coordinator can connect to the other daemon's thrift endpoint.
  This simplifies things and allows shutdown in various important
  cases, e.g. statestore down.
* The shutdown time limit can be overridden to force a quicker or
  slower shutdown.
* If shutting down, a banner is shown on the root debug page.

Workflow:
1. (if a coordinator) clients are prevented from submitting
  queries to this coordinator via some out-of-band mechanism,
  e.g. load balancer
2. the shutdown process is started via ":shutdown()"
3. a bit is set in the statestore and propagated to coordinators,
  which stop scheduling fragment instances on this daemon
  (if an executor).
4. the quiesce period (which is the AC queueing delay plus
  some additional leeway) expires
5. once the daemon is drained (i.e. no fragments, no registered
  queries), it shuts itself down.
6. If the daemon does not drain (e.g. rogue clients, long-running
  queries), after a longer timeout it will shut down anyway.

What this does do:
* Executors can be shut down without causing a service-wide outage
* Shutting down an executor will not disrupt any short-running queries
  and will wait for long-running queries up to a threshold.
* Coordinators can be shut down without query failures only if
  there is an out-of-band mechanism to prevent submission of more
  queries to the shut down coordinator.
* Long running queries or other issues (e.g. stuck fragments) will
  slow down but not prevent eventual shutdown.

Limitations:
* The quiesce period needs to be configured to be greater than the
  latency of statestore updates + scheduling + admission +
  coordinator startup. Otherwise a coordinator may send a
  fragment instance to the shutting down impalad. (We could
  automate this configuration as a follow-on)
* The quiesce period means a minimum latency for shutdown,
  even if the cluster is idle.
* We depend on the statestore detecting the process going down
  if queries are still running on that backend when the timeout
  expires. This may still be subject to existing problems,
  e.g. IMPALA-2990.

Tests:
* Added parser, analysis and authorization tests.
* End-to-end test of shutting down impalads.
* End-to-end test of shutting down then restarting an executor while
  queries are running.
* End-to-end test of shutting down a coordinator
  - New queries cannot be started on coord, existing queries continue to run
  - Exercises various Beeswax and HS2 operations.

Change-Id: I4d5606ccfec84db4482c1e7f0f198103aad141a0
---
M be/src/runtime/backend-client.h
M be/src/runtime/data-stream-test.cc
M be/src/scheduling/scheduler.cc
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-http-handler.cc
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/fault-injection-util.h
M be/src/util/default-path-handlers.cc
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/StatestoreService.thrift
M common/thrift/Types.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/AdminFnStmt.java
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/LimitElement.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationTestV2.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M tests/common/impala_cluster.py
M tests/common/impala_service.py
M tests/custom_cluster/test_restart_services.py
M tests/hs2/hs2_test_suite.py
M tests/hs2/test_fetch_first.py
M www/backends.tmpl
M www/root.tmpl
35 files changed, 963 insertions(+), 90 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/10744/6
-- 
To view, visit http://gerrit.cloudera.org:8080/10744
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4d5606ccfec84db4482c1e7f0f198103aad141a0
Gerrit-Change-Number: 10744
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>