You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jeff Zhang (JIRA)" <ji...@apache.org> on 2014/10/27 11:21:33 UTC

[jira] [Comment Edited] (TEZ-1689) Exception handling for EdgeManagerPlugin

    [ https://issues.apache.org/jira/browse/TEZ-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14185005#comment-14185005 ] 

Jeff Zhang edited comment on TEZ-1689 at 10/27/14 10:20 AM:
------------------------------------------------------------

[~seth.siddharth@gmail.com] Thanks for your review.

bq. For all AMUserCodeExceptions thrown from edge events, it will be useful to include the source and destination vertices which are connected on the Edge. AMUserCodeException could have an additional diagnosticMessage field which could be populated and logged each time.
Put the context info (source and destination vertices) in AMUserCodeException) and add them to diagnostics. 
bq. This is a question. Should setParallelism be throwing Exception or AMUserCodeException ?
Change it to AMUserCodeException. Need to do some work in TEZ-1699.
bq. edge.setCustomEdgeManager - returning false. This isn't very useful in general, since none of the VertexManagers actually do anything with the return value. Instead, we should just throw a RuntimeException - which would eventually result in a VertexManager failure and will result in the DAG failing.
throw TezUnCheckedException
bq.  * Use successSetParallelism after reset.
bq.  * DAGImpl changes - error reporting should be using getCause()
bq.  * ingore this exception, should not happen| If this is meant to be ignored, the log line should indicate this.
Fix them

bq. {code}
+  // this method would been called by all the task attempts,
+  // should always throw exception in the subsequent task attempt if
+  // exception happen in the first task attempt.
      {code} 

I think this piece of code (VertexImpl.getOutputSpecList() has some issue, it would cache the outputSpecList in its origin implementation, but I don't think we should cache it as it depends on the taskIndex, although in all the EdgeManagerPlugin Implementations, the value is the same no matter what the taskIndex is. So in the new patch, I remove the cache.

For the InputInitlizer, create TEZ-1703 for it.

Besides, I feel that the current exception handing a little complicated, I think we could just throw exception in Transition and then do the exception handling work in the handle method of DAG/Vertex/Task/TaskAttempt.  This require the state machine has a callback on transition fail, I create a YARN ticket for this: YARN-2750.  Let me know your thoughts. 



was (Author: zjffdu):
[~seth.siddharth@gmail.com] Thanks for your review.

bq. For all AMUserCodeExceptions thrown from edge events, it will be useful to include the source and destination vertices which are connected on the Edge. AMUserCodeException could have an additional diagnosticMessage field which could be populated and logged each time.
Put the context info (source and destination vertices) in AMUserCodeException) and add them to diagnostics. 
bq. This is a question. Should setParallelism be throwing Exception or AMUserCodeException ?
Change it to AMUserCodeException. Need to do some work in TEZ-1699.
bq. edge.setCustomEdgeManager - returning false. This isn't very useful in general, since none of the VertexManagers actually do anything with the return value. Instead, we should just throw a RuntimeException - which would eventually result in a VertexManager failure and will result in the DAG failing.
throw TezUnCheckedException
bq.  * Use successSetParallelism after reset.
bq.  * DAGImpl changes - error reporting should be using getCause()
bq.  * ingore this exception, should not happen| If this is meant to be ignored, the log line should indicate this.
Fix them

bq. {code}
+  // this method would been called by all the task attempts,
+  // should always throw exception in the subsequent task attempt if
+  // exception happen in the first task attempt.
      {code} 

I think this piece of code (VertexImpl.getOutputSpecList() has some issue, it would cache the outputSpecList in its origin implementation, but I don't think we should cache it as it depends on the taskIndex, although in all the EdgeManagerPlugin Implementations, the value is the same no matter what the taskIndex is. So in the new patch, I remove the cache.

For the InputInitlizer, create TEZ-1703 for it.

Besides, I feel that the current exception handing a little complicated, I think we could just throw exception in Transition and then do the exception handling work in the handle method of DAG/Vertex/Task/TaskAttempt.  This need the state machine has a callback on transition fail, I create a YARN ticket for this: YARN-2750.  Let me know your thoughts. 


> Exception handling for EdgeManagerPlugin
> ----------------------------------------
>
>                 Key: TEZ-1689
>                 URL: https://issues.apache.org/jira/browse/TEZ-1689
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>            Priority: Critical
>         Attachments: TEZ-1689-2.patch, TEZ-1689.patch
>
>
> EdgeManagePlugin and InputInitializer are both user code which could lead exception, we should handle it, fail the DAG and display the exception in client side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)