You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jeff Zhang (JIRA)" <ji...@apache.org> on 2014/10/30 04:28:33 UTC

[jira] [Comment Edited] (TEZ-1703) Exception handling for InputInitializer

    [ https://issues.apache.org/jira/browse/TEZ-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189576#comment-14189576 ] 

Jeff Zhang edited comment on TEZ-1703 at 10/30/14 3:28 AM:
-----------------------------------------------------------

bq.  {code}
 DAGTerminationCause.VERTEX_FAILURE,
 vertexEvent.getVertexTerminationCause() == null ? VertexTerminationCause.OTHER_VERTEX_FAILURE
 : vertexEvent.getVertexTerminationCause());
 DAGTerminationCause.VERTEX_FAILURE, VertexTerminationCause.OTHER_VERTEX_FAILURE);
{code}
bq. This is required so that all vertices don't get the same termination cause as the first vertex to fail ?
Yes, otherwise all the vertices' termination would be the same which don't make sense to me. Beside there will be one issue in VertexImpl.checkVertexForCompletion where we will check the termination cause where we don't check ROOT_INPUT_INIT_FAILURE. 

bq. Prior to the patch
bq. It's possible for events/notifications to be sent to a complete Initializer since the initializers / events are handled in separate threads. The setComplete() and isComplete checks aren't sufficient to avoid this.
bq. Ideally, completed initializers should just handle these events gracefully, but that's not something that Tez can guarantee. We need to handle such situations, likely in a separate jira.
After Initialize completed, InputInitliazerManager would been shutdown, will that solve this issue ?

bq. Will inputInitializer failures never go through this transition ? It may be better to set this up based on the SOURCE information available in the exception.
InputInitializer will set TerminationCause as ROOT_INIT_FAILURE rather than AM_USERCODE_EXCEPTION which is a special cause.  Maybe we could still split AMUserCodeException into VertexManagerException/EdgeManagerException, then it would be much more clear and consistency.

bq. The state machine in VertexImpl will need to change to handle INITIALIER_FAILED in some more states, and fail the vertex.
Add more transition in the state machine. 

bq. And +1 for renaming the file. Please do that just before the commit though - not as part of iterative patches.
Actually it is more about renaming RootInputInitlaizerManager, do the following changes:
* RootInputInitializerManager -> InputInitializerManager
* TezRootInputInitializerContextImpl -> TezInputInitializerContextImpl
* VertexEventRootInputInitialized -> VertexEventInputInitialized
* VertexEventRootInputFailed -> VertexEventInputFailed
* VertexTerminationCause.ROOT_INPUT_INIT_FAILURE -> VertexTerminationCause.INPUT_INIT_FAILURE.
* EventType.ROOT_INPUT_DATA_INFORMATION_EVENT -> EventType.INPUT_DATA_INFORMATION_EVENT
* EventType.ROOT_INPUT_INITIALIZER_EVENT -> EventType.INPUT_INITIALIZER_EVENT
* VertexEventType.V_ROOT_INPUT_INITIALIZED -> VertexEventType.V_INPUT_INITIALIZED
* VertexEventType.V_ROOT_INPUT_FAILED -> VertexEventType.V_INPUT_INIT_FAILED






was (Author: zjffdu):
bq.  {code}
 DAGTerminationCause.VERTEX_FAILURE,
 vertexEvent.getVertexTerminationCause() == null ? VertexTerminationCause.OTHER_VERTEX_FAILURE
 : vertexEvent.getVertexTerminationCause());
 DAGTerminationCause.VERTEX_FAILURE, VertexTerminationCause.OTHER_VERTEX_FAILURE);
{code}
bq. This is required so that all vertices don't get the same termination cause as the first vertex to fail ?
Yes, otherwise all the vertices' termination would be the same which don't make sense to me. Beside there will be one issue in VertexImpl.checkVertexForCompletion where we will check the termination cause where we don't check ROOT_INPUT_INIT_FAILURE. 

bq. Prior to the patch
bq. It's possible for events/notifications to be sent to a complete Initializer since the initializers / events are handled in separate threads. The setComplete() and isComplete checks aren't sufficient to avoid this.
bq. Ideally, completed initializers should just handle these events gracefully, but that's not something that Tez can guarantee. We need to handle such situations, likely in a separate jira.
After Initialize completed, InputInitliazerManager would been shutdown, will that solve this issue ?

bq. Will inputInitializer failures never go through this transition ? It may be better to set this up based on the SOURCE information available in the exception.
InputInitializer will set TerminationCause as ROOT_INIT_FAILURE rather than AM_USERCODE_EXCEPTION which is a special cause.  Maybe we could still split AMUserCodeException into VertexManagerException/EdgeManagerException, then it would be much more clear and consistency.

bq. The state machine in VertexImpl will need to change to handle INITIALIER_FAILED in some more states, and fail the vertex.
Add more transition in the state machine. But there will be on tricky case that INIT_SUCCEEDED following by INIT_FAILURE, because INIT_SUCCEEDED would shutdown InputInitliazerManager, in that cast the InputInitliazer Thread would been interupted, and 

bq. And +1 for renaming the file. Please do that just before the commit though - not as part of iterative patches.
Actually it is more about renaming RootInputInitlaizerManager, do the following changes:
* RootInputInitializerManager -> InputInitializerManager
* TezRootInputInitializerContextImpl -> TezInputInitializerContextImpl
* VertexEventRootInputInitialized -> VertexEventInputInitialized
* VertexEventRootInputFailed -> VertexEventInputFailed
* VertexTerminationCause.ROOT_INPUT_INIT_FAILURE -> VertexTerminationCause.INPUT_INIT_FAILURE.
* EventType.ROOT_INPUT_DATA_INFORMATION_EVENT -> EventType.INPUT_DATA_INFORMATION_EVENT
* EventType.ROOT_INPUT_INITIALIZER_EVENT -> EventType.INPUT_INITIALIZER_EVENT
* VertexEventType.V_ROOT_INPUT_INITIALIZED -> VertexEventType.V_INPUT_INITIALIZED
* VertexEventType.V_ROOT_INPUT_FAILED -> VertexEventType.V_INPUT_INIT_FAILED





> Exception handling for InputInitializer
> ---------------------------------------
>
>                 Key: TEZ-1703
>                 URL: https://issues.apache.org/jira/browse/TEZ-1703
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.5.1
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>         Attachments: TEZ-1703-2.patch, TEZ-1703-3.patch, TEZ-1703.patch
>
>
> For handleInputInitializerEvent - this should be fairly straightfoward to handle. At the moment this is an inline call from within the AsyncDispatcher, and will end up causing a RuntimeException. The RuntimeException can be changed to a AMUserCodeException which will take care of this.
> For onVertexStateUpdated, this eventually gets invoked from within RootInputInitializerManager. Catching exceptions there and sending a RootInputInitialzierFailedEvent should be enough to fix this ? May require some state machine changes to handle this event on a few more states.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)