You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2017/11/07 06:36:02 UTC

[jira] [Resolved] (IMPALA-1575) Cancelled queries do not yield resources until close

     [ https://issues.apache.org/jira/browse/IMPALA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong resolved IMPALA-1575.
-----------------------------------
    Resolution: Fixed


IMPALA-1575: part 2: yield admission control resources

This change releases admission control resources more eagerly,
once the query has finished actively executing. Some resources
(tracked and untracked) are still consumed by the client request
as long as it remains open, e.g. memory for control structures
and the result cache. However, these resources are relatively
small and should not block admission of new queries.

The same as in part 1, query execution is considered to be finished
under any of the following conditions:
1. The query encounters an error and fails
2. The query is cancelled due to the idle query timeout
3. The query reaches eos (or the DML completes)
4. The client cancels the query without closing the query

Admission control resources are released in two ways:
1. by calling AdmissionController::ReleaseQuery() on the coordinator
   promptly after query execution finishes, instead of waiting for
   UnregisterQuery(). This means that the query and its memory is
   no longer considered "admitted".
2. by changing the behaviour of MemTracker::GetPoolMemReserved() so
   that it is aware of when a query has finished executing and does not
   consider its entire memory limit to be "reserved".

The preconditions for releasing an admitted query are subtle because the
queries are being admitted to a distributed system, not just the
coordinator.  The comment for ReleaseAdmissionControlResources()
documents the preconditions and rationale. Note that the preconditions
are not weaker than the preconditions of calling UnregisterQuery()
before this patch.

Testing:
TestAdmissionController is extended to end queries in four ways:
cancellation by client, idle timeout, the last row being fetched,
and the client closing the query. The test uses a mix of all four.
After the query ends, all clients wait for the test to complete
before closing the query or closing the connection. This ensures
that the admission control decisions are based entirely on the
query end behavior. This test works for both query admission control
and mem_limit admission control and can detect both kinds of admission
control resources ("admitted" and "reserved") not being released
promptly.

This is based on an earlier patch by Joe McDonnell.

Change-Id: I80279eb2bda740d7f61420f52db3bfa42a6a51ac
Reviewed-on: http://gerrit.cloudera.org:8080/8323
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins

> Cancelled queries do not yield resources until close
> ----------------------------------------------------
>
>                 Key: IMPALA-1575
>                 URL: https://issues.apache.org/jira/browse/IMPALA-1575
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.1, Impala 2.3.0
>            Reporter: Henry Robinson
>            Assignee: Tim Armstrong
>            Priority: Critical
>              Labels: query-lifecycle, resource-management
>             Fix For: Impala 2.11.0
>
>
> A cancelled query (for example due to a timeout) or a query that has reached eos (but not explicitly closed) holds (1) resources on the coordinator fragment, (2) all resources accounted by the admission controller, (3) llama reservations. (However, Llama has been unsupported for CDH 5.5 and beyond, so (3) will no longer apply.) All of these are not released until the query is closed, which may not happen promptly for some clients.
> This frequently occurs with Hue. Hue (and some other clients that behave similarly) will not close a query until explicitly closed (in the Hue case this is via a javascript callback sent by the browser when closing the Hue tab). If the query is left unattended (or the Hue tab is on a laptop that is closed, or the browser crashes), the close call is never sent, and while the query will "time out", the cancellation doesn't properly clean up resources.
> One way to mitigate this issue in this case is by using the --idle_session_timeout impalad argument to fully close a session and all associated queries after some amount of time (but this is not a workaround that works in all cases).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)