You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Amol Kekre (JIRA)" <ji...@apache.org> on 2011/07/21 01:43:58 UTC

[jira] [Created] (MAPREDUCE-2717) Client should be able to know why an AM crashed.

Client should be able to know why an AM crashed.
------------------------------------------------

                 Key: MAPREDUCE-2717
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2717
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: mrv2
            Reporter: Amol Kekre
             Fix For: 0.23.0


Today if an AM crashes, we have to dig through logs - very cumbersome. It is good to have client print some reason for
AM crash. Various possible reasons for AM crash:
 (1) AM container failed during localization itself.
 (2) AM container launched but failed before properly starting, for e.g. due to classpath issues
 (3) AM failed after starting properly.
 (4) an AM is expired and killed by the RM

Potential fixes:
 - For (1) and (2) the client should obtain the container-status, container diagnostics and exit code.
 - For (3), the AM should set some kind of reason for failure during its heartbeat to RM and the client should obtain
the same from RM.

		

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MAPREDUCE-2717) Client should be able to know why an AM crashed.

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy resolved MAPREDUCE-2717.
--------------------------------------

    Resolution: Duplicate
      Assignee:     (was: Siddharth Seth)

Most are fixed, now the diagnostics part is dup of MAPREDUCE-3065

> Client should be able to know why an AM crashed.
> ------------------------------------------------
>
>                 Key: MAPREDUCE-2717
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2717
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv2
>            Reporter: Amol Kekre
>            Priority: Blocker
>             Fix For: 0.23.0
>
>
> Today if an AM crashes, we have to dig through logs - very cumbersome. It is good to have client print some reason for
> AM crash. Various possible reasons for AM crash:
>  (1) AM container failed during localization itself.
>  (2) AM container launched but failed before properly starting, for e.g. due to classpath issues
>  (3) AM failed after starting properly.
>  (4) an AM is expired and killed by the RM
> Potential fixes:
>  - For (1) and (2) the client should obtain the container-status, container diagnostics and exit code.
>  - For (3), the AM should set some kind of reason for failure during its heartbeat to RM and the client should obtain
> the same from RM.
> 		

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira