You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@tez.apache.org by "Hitesh Shah (JIRA)" <ji...@apache.org> on 2016/04/01 02:23:25 UTC

[jira] [Commented] (TEZ-3161) Allow task to report different kinds of errors - fatal / kill

    [ https://issues.apache.org/jira/browse/TEZ-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15220917#comment-15220917 ] 

Hitesh Shah commented on TEZ-3161:
----------------------------------

bq. In terms of alternate naming - do you have suggestions on what would be less confusing

Not sure - fatalError(), abortProcessing() - not sure I have good suggestions especially as fatalError is probably the one which should be indicating a fatal error instead of the current non-fatal behavior. 

bq. I'm OK marking it as private

Lets mark it so initially until we can figure out a clear use-case for self-kills.  

bq. Any suggestion on this. Duplicate the TerminationCause to include FATAL_, and KILL_ for almost all the existing TerminationCauses ?

Wouldnt there be only one specific termination cause to indicate that the user-code told the framework to abort itself or kill itself?

bq. I though it was being written to history. 

TaskAttemptFinished event is being written to history but the failure type bit is not in the data being pushed to ATS ( check TimelineHistoryEventConversion or the *JsonConversion ). The proto was changed but that is only used in Recovery. 

Tests in sbubsequent follow-ups should be ok.  




> Allow task to report different kinds of errors - fatal / kill
> -------------------------------------------------------------
>
>                 Key: TEZ-3161
>                 URL: https://issues.apache.org/jira/browse/TEZ-3161
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>         Attachments: TEZ-3161.1.txt, TEZ-3161.2.txt, TEZ-3161.3.txt
>
>
> In some cases, task failures will be the same across all attempts - e.g. exceeding memory utilization on an operation. In this case, there's no point in running another attempt of the same task.
> There's other cases where a task may want to mark itself as KILLED - i.e. a temporary error. An example of this is pipelined shuffle.
> Tez should allow both operations.
> cc [~vikram.dixit], [~rajesh.balamohan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)