You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Eli Collins (JIRA)" <ji...@apache.org> on 2011/08/11 20:56:27 UTC
[jira] [Moved] (MAPREDUCE-2833) Job Tracker needs to collect more job/task execution stats and save them to DFS file

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins moved HADOOP-1950 to MAPREDUCE-2833:
------------------------------------------------

        Key: MAPREDUCE-2833  (was: HADOOP-1950)
    Project: Hadoop Map/Reduce  (was: Hadoop Common)

> Job Tracker needs to collect more job/task execution stats and save them to DFS file
> ------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2833
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2833
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Runping Qi
>              Labels: newbie
>
> In order to facilitate offline analysis on the dynamic behaviors and performance characterics of map/reduce jobs, 
> we need the job tracker to collect some data about jobs and save them to DFS files. Some data are  in time series form, 
> and some are not.
> Below is a preliminary list of desired data. Some of them are already available in the current job trackers. Some are new.
> For each map/reduce job, we need the following non time series data:
>    1. jobid, jobname,  number of mappers, number of reducers, start time, end time, end of mapper phase
>    2. Average (median, min, max) of successful mapper execution time, input/output records/bytes
>    3. Average (median, min, max) of uncessful mapper execution time, input/output records/bytes
>    4.Total mapper retries,  max, average number of re-tries per mapper
>    5. The reasons for mapper task fails.
>    6. Average (median, min, max) of successful reducer execution time, input/output reocrds/bytes
>            Execution time is the difference between the sort end time and the task end time
>    7. Average (median, min, max) of successful copy time (from the mapper phase end time  to the sort start time).
>    8. Average (median, min, max) of successful sorting time for successful reducers
>    9. Average (median, min, max) of unsuccessful reducer execution time (from the end of mapper phase or the start of the task, 
>        whichever later, to the end of task)
>    10. Total reducer retries,  max, average number of per reducer retries
>    11. The reasons for reducer task fails (user code error, lost tracker, failed to write to DFS, etc.)
> For each map/reduce job, we collect the following  time series data (with one minute interval):
>     1. Numbers of pending mappers, reducers
>     2. Number of running mappers, reducers
> For the job tracker, we need the following data:
>     1. Number of trackers 
>     2. Start time 
>     3. End time 
>     4. The list of map reduce jobs (their ids, starttime/endtime)
>     
> The following time series data (with one minute interval):
>     1. The number of running jobs
>     2. The numbers of running mappers/reducers
>     3. The number pending mappers/reducers 
> The data collection should be optional. That is, a job tracker can turn off such data collection, and 
> in that case, it should not pay the cost.
> The job tracker should organize the in memory version of the collected data in such a way that:
> 1. it does not consume excessive amount of memory
> 2. the data may be suitable for presenting through the Web status pages.
> The data saved on DFS files should be in hadoop record format.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira