You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2018/01/05 20:10:00 UTC

[jira] [Commented] (TEZ-160) Remove 5 second sleep at the end of AM completion.

    [ https://issues.apache.org/jira/browse/TEZ-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313799#comment-16313799 ] 

Rohini Palaniswamy commented on TEZ-160:
----------------------------------------

Recently ran noticed that about 5% of Pig jobs launched from Oozie in a cluster, had application status as KILLED even though the DAG succeeded and Pig scripts completed successfully. This was because Pig calls TezClient.stop() on shutdown. If it is not killed within 10 seconds, it calls frameworkClient.killApplication(sessionAppId); which kill the AM. Because of the sleep time of 5 seconds after shutdown is issued, an application finishing as SUCCEEDED or KILLED depended on whether the shutdown completed within the next 5 seconds. 

Can we skip this check if it is a user initiated shutdown or at least lower it to 1 or 2 seconds? In case of Pig it is a Tez session and pig client is calling shutdown. I think we can skip it in general if it was a Tez session. The only time it will go down automatically is if session timeout expires. Adding another 5 seconds in that case is also wasteful.

> Remove 5 second sleep at the end of AM completion.
> --------------------------------------------------
>
>                 Key: TEZ-160
>                 URL: https://issues.apache.org/jira/browse/TEZ-160
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Siddharth Seth
>              Labels: TEZ-0.2.0
>         Attachments: test.timeouts.txt
>
>
> ClientServiceDelegate/DAGClient doesn't seem to be getting job completion status from the AM after job completion. It, instead, always relies on the RM for this information. The information returned by the AM should be used while it's available.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)