You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues-all@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2018/06/19 17:25:00 UTC

[jira] [Assigned] (IMPALA-1760) Add decommissioning support / graceful shutdown / quiesce

     [ https://issues.apache.org/jira/browse/IMPALA-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong reassigned IMPALA-1760:
-------------------------------------

    Assignee: Tim Armstrong

> Add decommissioning support / graceful shutdown / quiesce
> ---------------------------------------------------------
>
>                 Key: IMPALA-1760
>                 URL: https://issues.apache.org/jira/browse/IMPALA-1760
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Distributed Exec
>    Affects Versions: Impala 2.1.1
>            Reporter: Henry Robinson
>            Assignee: Tim Armstrong
>            Priority: Critical
>              Labels: resource-management, scalability, scheduler, usability
>
> In larger clusters, node maintenance is a frequent occurrence. There's no way currently to stop an Impala node without failing running queries, without draining queries across the whole cluster first. We should fix that.
> Here's a proposal:
> * Add a {{Decommission}} RPC to ImpalaServer. Calling this causes an Impala daemon to stop accepting new fragments or queries.
> * The Impala daemon should mark its entry in the membership statestore topic as 'decommissioning'. This tells other Impala daemons not to try to assign work to it.
> * Once the running queries / fragments have finished (or maybe after a timeout has elapsed?), the Impala daemon will remove itself entirely from the statestore membership topic and enter 'offline mode'. 
> * Either Decommission() returns then, or the caller can check the statestore topic.
> * Any Impala daemon that's in the process of sending work to a decommission node (because of the race between the {{Decommission()}} call and every node getting the statestore up-date) should retry the query from the point of scheduling. It should only do this, say, three times before aborting the query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org