You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/03/25 01:05:00 UTC

[jira] [Commented] (IMPALA-10590) Ensure admissiond stays in sync with coordinators

    [ https://issues.apache.org/jira/browse/IMPALA-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308270#comment-17308270 ] 

ASF subversion and git services commented on IMPALA-10590:
----------------------------------------------------------

Commit e3bafcbef4fd7152ecfcbc7d331e41e9778caf15 in impala's branch refs/heads/master from Thomas Tauber-Marshall
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e3bafcb ]

IMPALA-10590: Introduce admission service heartbeat mechanism

Currently, if a ReleaseQuery rpc fails, it's possible for the
admission service to think that some resources are still being used
that are actually free.

This patch fixes the issue by introducing a periodic heartbeat rpc
from coordinators to the admission service which contains a list of
queries registered at that coordinator.

If there is a query that the admission service thinks is running but
is not included in the heartbeat, the admission service can conclude
that the query must have already completed and release its resources.

Testing:
- Added a test that uses a debug action to simulate ReleaseQuery rpcs
  failing and checks that query resources are released properly.

Change-Id: Ia528d92268cea487ada20b476935a81166f5ad34
Reviewed-on: http://gerrit.cloudera.org:8080/17194
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Ensure admissiond stays in sync with coordinators
> -------------------------------------------------
>
>                 Key: IMPALA-10590
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10590
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>    Affects Versions: Impala 4.0
>            Reporter: Thomas Tauber-Marshall
>            Assignee: Thomas Tauber-Marshall
>            Priority: Major
>
> Currently, its possible for the admission service to have an incorrect view of what resources are being used in the cluster if there are rpc failures. For example, if the ReleaseQuery rpc fails, the coordinator will retry a few times and then give up. In this case, a query has completed by the admission service doesn't know and will not allow other queries to be scheduled with those resources.
> We can solve this by adding a periodic heartbeat rpc from coordinators to the admission service. This heartbeat will include the query ids for all queries currently running at each coordinator, and then the admission service can clean up resources allocated to any queries that are not in the list, on the assumption that those queries must have completed already.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org