You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jian He (JIRA)" <ji...@apache.org> on 2013/08/19 05:09:48 UTC

[jira] [Commented] (MAPREDUCE-5466) Historyserver does not refresh the result of restarted jobs after RM restart

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13743499#comment-13743499 ] 

Jian He commented on MAPREDUCE-5466:
------------------------------------

The problem is that before AM actually exists, it sends a JobUnsuccessfulCompletionEvent which gets processed by jobHistoryEventHandler which closes the EventWriter and copy the jobhistoryfile inside staging directory to done_intermediate dir. Therefore, when job History Sever is queried, it gives back the old AM's info. So in case of Job_REBOOT event, we should ideally skip copying the job history/job summary files etc. to done_intermediate dir.
                
> Historyserver does not refresh the result of restarted jobs after RM restart
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5466
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5466
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: yeshavora
>            Assignee: Jian He
>         Attachments: MAPREDUCE-5466.patch
>
>
> Restart RM when sort job is running and verify that the job passes successfully after RM restarts. 
> Once the job finishes successfully, run job status command for sort job. It shows "Job state =FAILED". Job history server does not update the result for the job which restarted after RM restart.
> hadoop job -status job_1375923346354_0003
> 13/08/08 01:24:13 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
> Job: job_1375923346354_0003
> Job File: hdfs://host1:port1/history/done/2013/08/08/000000/job_1375923346354_0003_conf.xml
> Job Tracking URL : http://historyserver:port2/jobhistory/job/job_1375923346354_0003
> Uber job : false
> Number of maps: 80
> Number of reduces: 1
> map() completion: 0.0
> reduce() completion: 0.0
> Job state: FAILED
> retired: false
> reason for failure: There are no failed tasks for the job. Job is failed due to some other reason and reason can be found in the logs.
> Counters not available. Job is retired.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira