You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "TezQA (JIRA)" <ji...@apache.org> on 2016/08/10 23:29:20 UTC
[jira] [Commented] (TEZ-3405) Support ability for AM to kill itself
if there is no client heartbeating to it
[ https://issues.apache.org/jira/browse/TEZ-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416197#comment-15416197 ]
TezQA commented on TEZ-3405:
----------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12823143/TEZ-3405.1.patch
against master revision b8ff941.
{color:red}-1 patch{color}. master compilation may be broken.
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1905//console
This message is automatically generated.
> Support ability for AM to kill itself if there is no client heartbeating to it
> ------------------------------------------------------------------------------
>
> Key: TEZ-3405
> URL: https://issues.apache.org/jira/browse/TEZ-3405
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Gunther Hagleitner
> Assignee: Hitesh Shah
> Priority: Critical
> Attachments: TEZ-3405.1.patch
>
>
> HiveServer2 optionally maintains a pool of AMs in either Tez or LLAP mode. This is done to amortize the cost of launching a Tez session.
> We also try in a shutdown hook to kill all these AMs when HS2 goes down. However, there are cases where HS2 doesn't get the chance to kill these AMs before it goes away. As a result these zombie AMs hang around until the timeout kicks in.
> The trouble with the timeout is that we have to set it fairly high. Otherwise the benefit of having pre-launched AMs obviously goes away (in a lightly loaded cluster).
> So, if people kill/restart HS2 they often times run into situations where the cluster/queue doesn't have any more capacity for AMs. They either have to manually kill the zombies or wait.
> The request is therefore for Tez to maintain a heartbeat to the client. If the client goes away the AM should exit. That way we can keep the AMs alive for a long time regardless of activity and at the same time don't have to worry about them if HS2 goes down.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)