You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Runping Qi (JIRA)" <ji...@apache.org> on 2007/09/26 20:29:50 UTC

[jira] Created: (HADOOP-1950) Job Tracker needs to collect more job/task execution stats and save them to DFS file

Job Tracker needs to collect more job/task execution stats and save them to DFS file
------------------------------------------------------------------------------------

                 Key: HADOOP-1950
                 URL: https://issues.apache.org/jira/browse/HADOOP-1950
             Project: Hadoop
          Issue Type: New Feature
            Reporter: Runping Qi



In order to facilitate offline analysis on the dynamic behaviors and performance characterics of map/reduce jobs, 
we need the job tracker to collect some data about jobs and save them to DFS files. Some data are  in time series form, 
and some are not.
Below is a preliminary list of desired data. Some of them are already available in the current job trackers. Some are new.

For each map/reduce job, we need the following non time series data:
   1. jobid, jobname,  number of mappers, number of reducers, start time, end time, end of mapper phase
   2. Average (median, min, max) of successful mapper execution time, input/output records/bytes
   3. Average (median, min, max) of uncessful mapper execution time, input/output records/bytes
   4.Total mapper retries,  max, average number of re-tries per mapper
   5. The reasons for mapper task fails.

   6. Average (median, min, max) of successful reducer execution time, input/output reocrds/bytes
           Execution time is the difference between the sort end time and the task end time
   7. Average (median, min, max) of successful copy time (from the mapper phase end time  to the sort start time).
   8. Average (median, min, max) of successful sorting time for successful reducers

   9. Average (median, min, max) of unsuccessful reducer execution time (from the end of mapper phase or the start of the task, 
       whichever later, to the end of task)
   10. Total reducer retries,  max, average number of per reducer retries
   11. The reasons for reducer task fails (user code error, lost tracker, failed to write to DFS, etc.)

For each map/reduce job, we collect the following  time series data (with one minute interval):

    1. Numbers of pending mappers, reducers
    2. Number of running mappers, reducers

For the job tracker, we need the following data:

    1. Number of trackers 
    2. Start time 
    3. End time 
    4. The list of map reduce jobs (their ids, starttime/endtime)
    
The following time series data (with one minute interval):
    1. The number of running jobs
    2. The numbers of running mappers/reducers
    3. The number pending mappers/reducers 


The data collection should be optional. That is, a job tracker can turn off such data collection, and 
in that case, it should not pay the cost.

The job tracker should organize the in memory version of the collected data in such a way that:
1. it does not consume excessive amount of memory
2. the data may be suitable for presenting through the Web status pages.

The data saved on DFS files should be in hadoop record format.




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1950) Job Tracker needs to collect more job/task execution stats and save them to DFS file

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530852 ] 

Runping Qi commented on HADOOP-1950:
------------------------------------

Just realized that hadoop job tracker already creates one history file per map/reduce job.
Most of the data this Jira requested can be re-generated from there.


> Job Tracker needs to collect more job/task execution stats and save them to DFS file
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1950
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1950
>             Project: Hadoop
>          Issue Type: New Feature
>            Reporter: Runping Qi
>
> In order to facilitate offline analysis on the dynamic behaviors and performance characterics of map/reduce jobs, 
> we need the job tracker to collect some data about jobs and save them to DFS files. Some data are  in time series form, 
> and some are not.
> Below is a preliminary list of desired data. Some of them are already available in the current job trackers. Some are new.
> For each map/reduce job, we need the following non time series data:
>    1. jobid, jobname,  number of mappers, number of reducers, start time, end time, end of mapper phase
>    2. Average (median, min, max) of successful mapper execution time, input/output records/bytes
>    3. Average (median, min, max) of uncessful mapper execution time, input/output records/bytes
>    4.Total mapper retries,  max, average number of re-tries per mapper
>    5. The reasons for mapper task fails.
>    6. Average (median, min, max) of successful reducer execution time, input/output reocrds/bytes
>            Execution time is the difference between the sort end time and the task end time
>    7. Average (median, min, max) of successful copy time (from the mapper phase end time  to the sort start time).
>    8. Average (median, min, max) of successful sorting time for successful reducers
>    9. Average (median, min, max) of unsuccessful reducer execution time (from the end of mapper phase or the start of the task, 
>        whichever later, to the end of task)
>    10. Total reducer retries,  max, average number of per reducer retries
>    11. The reasons for reducer task fails (user code error, lost tracker, failed to write to DFS, etc.)
> For each map/reduce job, we collect the following  time series data (with one minute interval):
>     1. Numbers of pending mappers, reducers
>     2. Number of running mappers, reducers
> For the job tracker, we need the following data:
>     1. Number of trackers 
>     2. Start time 
>     3. End time 
>     4. The list of map reduce jobs (their ids, starttime/endtime)
>     
> The following time series data (with one minute interval):
>     1. The number of running jobs
>     2. The numbers of running mappers/reducers
>     3. The number pending mappers/reducers 
> The data collection should be optional. That is, a job tracker can turn off such data collection, and 
> in that case, it should not pay the cost.
> The job tracker should organize the in memory version of the collected data in such a way that:
> 1. it does not consume excessive amount of memory
> 2. the data may be suitable for presenting through the Web status pages.
> The data saved on DFS files should be in hadoop record format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.