You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tez.apache.org by "Siddharth Seth (JIRA)" <ji...@apache.org> on 2016/01/07 23:58:39 UTC

[jira] [Created] (TEZ-3028) Improvements to error handling

Siddharth Seth created TEZ-3028:
-----------------------------------

             Summary: Improvements to error handling
                 Key: TEZ-3028
                 URL: https://issues.apache.org/jira/browse/TEZ-3028
             Project: Apache Tez
          Issue Type: Bug
            Reporter: Siddharth Seth


There's several places where exceptions can reach the Dispatcher - which can cause a restart. Some of these may be valid and need to be evaluated.
e.g. TaskCommunicatorManager tracks known containers etc. In case of an error - it throws an unchecked exception, which I believe will reach the dispatcher directly. (Something like this happening would indicate a bug in the framework). Should this trigger a restart of the AM - or shutting down with an internal error?

The TaskSchedulerManager handles exceptions while processing events and dispatches a generic INTERNAL_ERRROR to the DAGAppMaster. This can be augmented with the reason for the error so that diagnostics are displayed correctly (or at least posted to the history service)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)