You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Eli Collins (JIRA)" <ji...@apache.org> on 2011/08/11 20:56:27 UTC
[jira] [Moved] (MAPREDUCE-2833) Job Tracker needs to collect more
job/task execution stats and save them to DFS file
[ https://issues.apache.org/jira/browse/MAPREDUCE-2833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eli Collins moved HADOOP-1950 to MAPREDUCE-2833:
------------------------------------------------
Key: MAPREDUCE-2833 (was: HADOOP-1950)
Project: Hadoop Map/Reduce (was: Hadoop Common)
> Job Tracker needs to collect more job/task execution stats and save them to DFS file
> ------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-2833
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2833
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Reporter: Runping Qi
> Labels: newbie
>
> In order to facilitate offline analysis on the dynamic behaviors and performance characterics of map/reduce jobs,
> we need the job tracker to collect some data about jobs and save them to DFS files. Some data are in time series form,
> and some are not.
> Below is a preliminary list of desired data. Some of them are already available in the current job trackers. Some are new.
> For each map/reduce job, we need the following non time series data:
> 1. jobid, jobname, number of mappers, number of reducers, start time, end time, end of mapper phase
> 2. Average (median, min, max) of successful mapper execution time, input/output records/bytes
> 3. Average (median, min, max) of uncessful mapper execution time, input/output records/bytes
> 4.Total mapper retries, max, average number of re-tries per mapper
> 5. The reasons for mapper task fails.
> 6. Average (median, min, max) of successful reducer execution time, input/output reocrds/bytes
> Execution time is the difference between the sort end time and the task end time
> 7. Average (median, min, max) of successful copy time (from the mapper phase end time to the sort start time).
> 8. Average (median, min, max) of successful sorting time for successful reducers
> 9. Average (median, min, max) of unsuccessful reducer execution time (from the end of mapper phase or the start of the task,
> whichever later, to the end of task)
> 10. Total reducer retries, max, average number of per reducer retries
> 11. The reasons for reducer task fails (user code error, lost tracker, failed to write to DFS, etc.)
> For each map/reduce job, we collect the following time series data (with one minute interval):
> 1. Numbers of pending mappers, reducers
> 2. Number of running mappers, reducers
> For the job tracker, we need the following data:
> 1. Number of trackers
> 2. Start time
> 3. End time
> 4. The list of map reduce jobs (their ids, starttime/endtime)
>
> The following time series data (with one minute interval):
> 1. The number of running jobs
> 2. The numbers of running mappers/reducers
> 3. The number pending mappers/reducers
> The data collection should be optional. That is, a job tracker can turn off such data collection, and
> in that case, it should not pay the cost.
> The job tracker should organize the in memory version of the collected data in such a way that:
> 1. it does not consume excessive amount of memory
> 2. the data may be suitable for presenting through the Web status pages.
> The data saved on DFS files should be in hadoop record format.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira