You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jeff Zhang (JIRA)" <ji...@apache.org> on 2014/11/27 13:01:12 UTC

[jira] [Comment Edited] (TEZ-1273) Refactor DAGAppMaster to state machine based

    [ https://issues.apache.org/jira/browse/TEZ-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227579#comment-14227579 ] 

Jeff Zhang edited comment on TEZ-1273 at 11/27/14 12:00 PM:
------------------------------------------------------------

Attach new patch and new state machine. [~hitesh] Please help review.

*  Add unit test for different cases of the state machine transition.  Fail from NEW, INITED, RECOVERING is not easy to do in unit test, I just verify it manually. 
*  All unit test passed
*  The state machine has the following different behavior from the current AM.
** In the case of Session timeout when idle, AM should go to SUCCEEDED or KILLED ? In the patch I make it go to KILLED, but current AM will go to SUCCEEDED. IMO, KILLED make more sense. (Kill itself)
** Is it necessary to set shouldUnregisterFlag when it is timeout ? In the patch I set the flag, but the existing code didn't set the flag ( That means the AM will restart again and continue to wait there until timeout which looks weird to me ).
** When AM is killed in running,  Should it go to SUCCEEDED or KILLED ? In the patch I make it go to KILLED, but the current AM behavior is going to SUCCEEDED.




was (Author: zjffdu):
Attach new patch and new state machine. [~hitesh] Please help review.

*  Add unit test for different cases of the state machine transition.  Fail from Init and Start is not easy to do in unit test, I just verify it manully. 
*  All unit test passed
*  The state machine has the following different behavior different from the exiting AM.
** In the case of Session timeout when idle, AM should go to SUCCEEDED or KILLED ? In the patch I make it go to KILLED, but exiting AM will go to SUCCEEDED. IMO, KILLED make more sense. (Kill itself)
** Is it necessary set shouldUnregisterFlag when it is timeout ? In the patch I set the flag, but the existing code didn't set the flag ( That means the AM will restart again and continue to wait there until timeout which looks like weird to me.
** When AM is killed in running,  Should it go to SUCCEEDED or KILLED ? In the patch I make it go to KILLED, but the current AM behavior is going to SUCCEEDED.



> Refactor DAGAppMaster to state machine based
> --------------------------------------------
>
>                 Key: TEZ-1273
>                 URL: https://issues.apache.org/jira/browse/TEZ-1273
>             Project: Apache Tez
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>         Attachments: DAGAppMaster_3.pdf, TEZ-1273-3.patch, Tez-1273-2.patch, Tez-1273.patch, dag_app_master.pdf, dag_app_master2.pdf
>
>
> Almost all our entities (Vertex, Task etc) are state machine based and written using a formal state machine. But DAGAppMaster is not written on a formal state machine even though it has a state machine based behavior. This jira is for refactoring it into state machine based



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)