You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Haibo Chen (JIRA)" <ji...@apache.org> on 2017/06/01 21:41:04 UTC

[jira] [Commented] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033746#comment-16033746 ] 

Haibo Chen commented on MAPREDUCE-6892:
---------------------------------------

Thanks [~mozer] for reporting and fixing the issue. The map/reduce counters in JobUnsuccessfulCompletionEvent are inconsistent. In some cases,  # of successful mappers is passed to JobUnsuccessfulCompletion.finishedMaps. In other cases, # of successful + failed + killed mappers is reported.  I think we should make it as least consistent, and then add failed/killed mapper/reducer count. 

org.apache.hadoop.mapreduce.v2.app.job.Job is not client facing as far as I can tell, so I think it's fine to add a few more methods (killedMapTaskCount, killedReduceTaskCount). In upgrade cases though, JHS expects the new JobUnsuccessfulCompletion and JobFinishedEvent schema, but it could pick up old .jhist file sthat do not conform with the new schema, we want to make sure it handles the special situation gracefully. Can you check that?

> Issues with the count of failed/killed tasks in the jhist file
> --------------------------------------------------------------
>
>                 Key: MAPREDUCE-6892
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client, jobhistoryserver
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>         Attachments: MAPREDUCE-6892-001.patch
>
>
> Recently we encountered some issues with the value of failed tasks. After parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually there were failures. 
> Another minor thing is that you cannot get the number of killed tasks (although this can be calculated).
> The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the successful map/reduce task counts. Number of failed (or killed) tasks are not stored.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org