You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Peter Bacsko (JIRA)" <ji...@apache.org> on 2017/05/24 10:03:04 UTC

[jira] [Comment Edited] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16022630#comment-16022630 ] 

Peter Bacsko edited comment on MAPREDUCE-6892 at 5/24/17 10:02 AM:
-------------------------------------------------------------------

Approach to the solution:

1. Update the Avro schema of {{JobUnsuccessfulCompletion}} and {{JobFinished}}
2. Update {{JobFinishedEvent}} with the number of killed maps/reduces
3. Modify {{JobImpl}} to set the number of killed maps/reduces in {{JobFinishedEvent}} (inside {{createJobFinishedEvent()}})
4. Update {{JobUnsuccessfulCompletionEvent}} so that it contains the number of failed/killed maps and reduces
5. Modify {{JobHistoryEventHandler}} to set the new fields in {{JobUnsuccessfulCompletionEvent}}

In order to do #5, we have to know the number of failed/killed attempts which is currently cannot be retrieved from the {{Job}} interface. So we either add these methods (and return {{killedMapTaskCount}}, {{killedReduceTaskCount}} etc in {{JobImpl}}) or we retrieve all tasks with {{getTasks()}}, then check all task attempts individually and counting the number of killed/failed attempts. 

Modifying {{org.apache.hadoop.mapreduce.v2.app.job.Job}} might break things outside Hadoop - I don't know if this is something that we can do without announcing (there's no {{@InterfaceAudience}} annotation on it).

Storing the number of killed task is not an absolute must, but IMO it's good to have convenience methods to retrieve it.



was (Author: pbacsko):
Approach to the solution:

1. Update the Avro schema of {{JobUnsuccessfulCompletion}} and {{JobFinished}}
2. Update {{JobFinishedEvent}} with the number of killed maps/reduces
3. Modify {{JobImpl}} to set the number of killed maps/reduces in {{JobFinishedEvent}} (inside {{createJobFinishedEvent()}})
4. Update {{JobUnsuccessfulCompletionEvent}} so that it contains the number of failed/killed maps and reduces
5. Modify {{JobHistoryEventHandler}} to set the new fields in {{JobUnsuccessfulCompletionEvent}}

In order to do #5, we have to know the number of failed/killed attempts which is currently cannot be retrieved from the {{Job}} interface. So we either add these methods (and return {{killedMapTaskCount}}, {{killedReduceTaskCount}} etc in {{JobImpl}}) or we retrieve all tasks with {{getTasks()}}, then check all task attempts individually and counting the number of killed/failed attempts. 

Modifying {{org.apache.hadoop.mapreduce.v2.app.job.Job}} might break things outside Hadoop - I don't know it this is something that we can do without announcing it (there's no {{@InterfaceAudience}} annotation on it).

Storing the number of killed task is not an absolute must, but IMO it's good to have convenience methods to retrieve it.


> Issues with the count of failed/killed tasks in the jhist file
> --------------------------------------------------------------
>
>                 Key: MAPREDUCE-6892
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client, jobhistoryserver
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>
> Recently we encountered some issues with the value of failed tasks. After parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually there were failures. 
> Another minor thing is that you cannot get the number of killed tasks (although this can be calculated).
> The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the successful map/reduce task counts. Number of failed (or killed) tasks are not stored.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org