You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Robert Kanter (JIRA)" <ji...@apache.org> on 2014/02/14 03:45:25 UTC

[jira] [Updated] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Kanter updated MAPREDUCE-5641:
-------------------------------------

    Attachment: MAPREDUCE-5641.patch

I’ve attached a preliminary version of the patch.  Once we all agree on the specifics of the design, I can add unit tests.  
The patch follows the design I outlined before where the RM will write a file when it sees an AM die and the JHS see that and copies the jhist and similar files to the done_intermediate dir.  I have tested this by running jobs and killing the AM.  This results in incomplete information, as expected; however, in some cases some of the information won’t make 100% sense or is missing (e.g. no Finish Time if the AM didn’t actually finish).  I’ve put in some code to take care of these situations.  I’ve also attached a preliminary YARN patch to YARN-1731.  

{quote}
How will the JHS copy the file to the intermediate directory? It likely won't have access to the staging directory containing the jhist file.
{quote}
I modified the permissions from 0700 to 0701.

> History for failed Application Masters should be made available to the Job History Server
> -----------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5641
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: applicationmaster, jobhistoryserver
>    Affects Versions: 2.2.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This is because the History is written by the AM to the intermediate folder just before finishing, so when it fails for any reason, this information isn't copied there.  However, it is not lost as its in the AM's staging directory.  To make the History available in the JHS, all we need to do is have another mechanism to move the History from the staging directory to the intermediate directory.  The AM also writes a "Summary" file before exiting normally, which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)