You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jeff Zhang (JIRA)" <ji...@apache.org> on 2015/07/23 02:44:05 UTC
[jira] [Commented] (TEZ-2311) AM can hang if kill received while
recovering from previous attempt
[ https://issues.apache.org/jira/browse/TEZ-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14637994#comment-14637994 ]
Jeff Zhang commented on TEZ-2311:
---------------------------------
[~lichangleo] [~jlowe], this context is not clear to me, have to admit that the current recovery code is fragile due to the evolving change of TEZ. I am doing a redesign of recovery, try to make it in 0.8, If you don't mind, I will move this jira to 0.8. Or if you think this is critical for you, could you please provide more context on that, app logs are helpful.
> AM can hang if kill received while recovering from previous attempt
> -------------------------------------------------------------------
>
> Key: TEZ-2311
> URL: https://issues.apache.org/jira/browse/TEZ-2311
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.6.0
> Reporter: Jason Lowe
> Labels: Recovery
>
> We saw an instance of a Tez job hanging despite receiving multiple kill requests from clients. The AM was recovering from a prior attempt when the first kill request arrived.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)