You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hemanth Yamijala (JIRA)" <ji...@apache.org> on 2009/05/28 08:09:45 UTC

[jira] Created: (HADOOP-5931) Collect information about number of tasks succeeded / total per time unit for a tasktracker.

Collect information about number of tasks succeeded / total per time unit for a tasktracker. 
---------------------------------------------------------------------------------------------

                 Key: HADOOP-5931
                 URL: https://issues.apache.org/jira/browse/HADOOP-5931
             Project: Hadoop Core
          Issue Type: Improvement
          Components: mapred
            Reporter: Hemanth Yamijala


Collecting information of number of tasks succeeded / total per tasktracker and being able to see these counts per hour, day and since start time will help reason about things like the blacklisting strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5931) Collect information about number of tasks succeeded / total per time unit for a tasktracker.

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721168#action_12721168 ] 

Sharad Agarwal commented on HADOOP-5931:
----------------------------------------

Had a discussion with Owen, following came up:
- Metric Api is an export interface, so we should not use it. We want to build the metrics natively in Hadoop so it should not be exposed via metrics config file.
- It is better to do the collection in jobtracker. The restart concern will go away as at some point we will have heartbeat transaction log. So recovery would be generic. Having it in jobtracker will give us more control to make scheduling decisions.


> Collect information about number of tasks succeeded / total per time unit for a tasktracker. 
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5931
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5931
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Hemanth Yamijala
>            Assignee: Sharad Agarwal
>             Fix For: 0.21.0
>
>         Attachments: 5931_v1.patch, 5931_v2.patch
>
>
> Collecting information of number of tasks succeeded / total per tasktracker and being able to see these counts per hour, day and since start time will help reason about things like the blacklisting strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5931) Collect information about number of tasks succeeded / total per time unit for a tasktracker.

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716297#action_12716297 ] 

Hemanth Yamijala commented on HADOOP-5931:
------------------------------------------

I am assuming the moving window mechanism would be flexible enough to add new bucket sizes as required.

Regarding having the computation on the tasktracker, and reporting the status via status, one problem is that if we want to change the bucket size, it would involve a change in the status object.

Also, one requirement for this is to store this information on the JobTracker. Can you describe how this will be stored, mechanics with respect to lost tasktrackers etc  ? 

Will this information be available if the JobTracker restarts ?

> Collect information about number of tasks succeeded / total per time unit for a tasktracker. 
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5931
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5931
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Hemanth Yamijala
>
> Collecting information of number of tasks succeeded / total per tasktracker and being able to see these counts per hour, day and since start time will help reason about things like the blacklisting strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-5931) Collect information about number of tasks succeeded / total per time unit for a tasktracker.

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sharad Agarwal reassigned HADOOP-5931:
--------------------------------------

    Assignee: Sharad Agarwal

> Collect information about number of tasks succeeded / total per time unit for a tasktracker. 
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5931
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5931
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Hemanth Yamijala
>            Assignee: Sharad Agarwal
>
> Collecting information of number of tasks succeeded / total per tasktracker and being able to see these counts per hour, day and since start time will help reason about things like the blacklisting strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5931) Collect information about number of tasks succeeded / total per time unit for a tasktracker.

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721579#action_12721579 ] 

Hadoop QA commented on HADOOP-5931:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12411062/5931_v3.patch
  against trunk revision 785928.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/531/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/531/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/531/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/531/console

This message is automatically generated.

> Collect information about number of tasks succeeded / total per time unit for a tasktracker. 
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5931
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5931
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Hemanth Yamijala
>            Assignee: Sharad Agarwal
>             Fix For: 0.21.0
>
>         Attachments: 5931_v1.patch, 5931_v2.patch, 5931_v3.patch
>
>
> Collecting information of number of tasks succeeded / total per tasktracker and being able to see these counts per hour, day and since start time will help reason about things like the blacklisting strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5931) Collect information about number of tasks succeeded / total per time unit for a tasktracker.

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sharad Agarwal updated HADOOP-5931:
-----------------------------------

    Status: Open  (was: Patch Available)

> Collect information about number of tasks succeeded / total per time unit for a tasktracker. 
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5931
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5931
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Hemanth Yamijala
>            Assignee: Sharad Agarwal
>             Fix For: 0.21.0
>
>         Attachments: 5931_v1.patch, 5931_v2.patch
>
>
> Collecting information of number of tasks succeeded / total per tasktracker and being able to see these counts per hour, day and since start time will help reason about things like the blacklisting strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5931) Collect information about number of tasks succeeded / total per time unit for a tasktracker.

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sharad Agarwal updated HADOOP-5931:
-----------------------------------

    Attachment: 5931_v2.patch

patch for review.

> Collect information about number of tasks succeeded / total per time unit for a tasktracker. 
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5931
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5931
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Hemanth Yamijala
>            Assignee: Sharad Agarwal
>             Fix For: 0.21.0
>
>         Attachments: 5931_v1.patch, 5931_v2.patch
>
>
> Collecting information of number of tasks succeeded / total per tasktracker and being able to see these counts per hour, day and since start time will help reason about things like the blacklisting strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5931) Collect information about number of tasks succeeded / total per time unit for a tasktracker.

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716222#action_12716222 ] 

Sharad Agarwal commented on HADOOP-5931:
----------------------------------------

To collect stats for last hour/day, we can have a moving window for that time period. A moving window can contain multiple time slots. The granularity of window movement/update is decided by the slot size. The slot size could be different for different time windows. For example, hour window could have 5 minutes, day window could have 1 hour update granularity. So in that case hour window would hold stats in 12 slots of 5 mins each. Likewise day window would hold stats in 24 slots of 1 hour each.

As the last slot time is crossed, a new slot would be added and the very first one would be knocked off. Hence moving the window by one slot.

A simple strategy could be to collect this information in TaskTracker and report that to JobTracker via TaskTrackerStatus. A subclass could be added to TaskTrackerStatus with fields, say:
tasksSinceStarted, tasksSuccededSinceStarted,
tasksSinceInLastHour, tasksSuccededInLastHour,
tasksSinceInLastDay, tasksSuccededInLastDay

To optimize on heartbeat size, we need not send the above fields with every heartbeat. This could be reported only at certain interval (typically the minimum slot size, 5 mins in above example).

An alternate way could be to compute all this in JobTracker. My vote goes for doing it in Tasktracker as this is mostly to do with individual Task tracker and doesn't need any global information.

Thoughts?


> Collect information about number of tasks succeeded / total per time unit for a tasktracker. 
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5931
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5931
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Hemanth Yamijala
>
> Collecting information of number of tasks succeeded / total per tasktracker and being able to see these counts per hour, day and since start time will help reason about things like the blacklisting strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5931) Collect information about number of tasks succeeded / total per time unit for a tasktracker.

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sharad Agarwal updated HADOOP-5931:
-----------------------------------

    Status: Patch Available  (was: Open)

> Collect information about number of tasks succeeded / total per time unit for a tasktracker. 
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5931
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5931
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Hemanth Yamijala
>            Assignee: Sharad Agarwal
>             Fix For: 0.21.0
>
>         Attachments: 5931_v1.patch, 5931_v2.patch, 5931_v3.patch
>
>
> Collecting information of number of tasks succeeded / total per tasktracker and being able to see these counts per hour, day and since start time will help reason about things like the blacklisting strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5931) Collect information about number of tasks succeeded / total per time unit for a tasktracker.

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716225#action_12716225 ] 

Sharad Agarwal commented on HADOOP-5931:
----------------------------------------

Correction: The fields names in last comment should read as:
tasksSinceStarted, tasksSuccededSinceStarted,
tasksInLastHour, tasksSuccededInLastHour,
tasksInLastDay, tasksSuccededInLastDay

> Collect information about number of tasks succeeded / total per time unit for a tasktracker. 
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5931
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5931
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Hemanth Yamijala
>
> Collecting information of number of tasks succeeded / total per tasktracker and being able to see these counts per hour, day and since start time will help reason about things like the blacklisting strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5931) Collect information about number of tasks succeeded / total per time unit for a tasktracker.

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720094#action_12720094 ] 

Sharad Agarwal commented on HADOOP-5931:
----------------------------------------

Had an off line discussion with Devaraj/Eric, the concern raised is that metric context is an export interface and instead of using it, we should collect the metrics natively in hadoop. Administrators should not be able to remove this metric as it may in future used by Jobtracker to make decisions. Right?
Let me clarify a bit. Please note that only time windows are configured in the metric properties, and not the actual metric name which gets collected. Also a new context name is defined "tasktracker" (Refer hadoop-metrics.properties in patch) . So it does not come in between the existing metric contexts. Those can continue to be chukwa/ganglia etc.
If this doesn't sound like a good idea, I see few options:
1. Give a better name to the added context say "core-mapred", so that administrators don't override it. It would serve only to add/remove time windows.

2. Do not use Metrics api. Expose the time window configuration via mapred-site.xml.

3. Don't expose the configuration at all and have fixed windows, say "last hour" and "last day".

I went with extending the metrics API because I thought that it would help to collect any other existing metrics in time windows without making much change to the code. For example if we want to collect "mapred" metrics in time windows, then "mapred" context can point to the Composite context, which can be configured to use multiple contexts, one being time window context.

Thoughts?

> Collect information about number of tasks succeeded / total per time unit for a tasktracker. 
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5931
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5931
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Hemanth Yamijala
>            Assignee: Sharad Agarwal
>             Fix For: 0.21.0
>
>         Attachments: 5931_v1.patch, 5931_v2.patch
>
>
> Collecting information of number of tasks succeeded / total per tasktracker and being able to see these counts per hour, day and since start time will help reason about things like the blacklisting strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5931) Collect information about number of tasks succeeded / total per time unit for a tasktracker.

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sharad Agarwal updated HADOOP-5931:
-----------------------------------

    Fix Version/s: 0.21.0
           Status: Patch Available  (was: Open)

> Collect information about number of tasks succeeded / total per time unit for a tasktracker. 
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5931
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5931
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Hemanth Yamijala
>            Assignee: Sharad Agarwal
>             Fix For: 0.21.0
>
>         Attachments: 5931_v1.patch, 5931_v2.patch
>
>
> Collecting information of number of tasks succeeded / total per tasktracker and being able to see these counts per hour, day and since start time will help reason about things like the blacklisting strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5931) Collect information about number of tasks succeeded / total per time unit for a tasktracker.

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718268#action_12718268 ] 

Sharad Agarwal commented on HADOOP-5931:
----------------------------------------

bq. I am assuming the moving window mechanism would be flexible enough to add new bucket sizes as required. 
Yes. I am planning to use and extend metric framework available in core, thru which custom window/bucket sizes can be defined.

bq. Regarding having the computation on the tasktracker, and reporting the status via status, one problem is that if we want to change the bucket size, it would involve a change in the status object.
To avoid that, instead of above fields, we can have say List<MetricInfo> metrics field in TaskTrackerStatus where MetricInfo could be:
class MetricInfo {
String name;
int tasks;
int tasksSucceeded;
}
Here name would be the name of the metrics. e.q. "lasthour", "lastday" etc. which could be configured in the metrics property file. 

bq. Also, one requirement for this is to store this information on the JobTracker. Can you describe how this will be stored, mechanics with respect to lost tasktrackers etc ?
Currently jobtracker doesn't store any information about lost tasktrackers. Storing info about lost trackers is not trivial and demands a separate jira issue. Consider the case of tracker getting lost and never coming back or coming back at different port. The jobtracker data structures need to be cleaned up for such trackers otherwise those data structures would be lying forever. 

bq. Will this information be available if the JobTracker restarts ?
Yes. Since this info is propagated from Tasktracker, it would be available after jobtracker restarts.

> Collect information about number of tasks succeeded / total per time unit for a tasktracker. 
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5931
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5931
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Hemanth Yamijala
>
> Collecting information of number of tasks succeeded / total per tasktracker and being able to see these counts per hour, day and since start time will help reason about things like the blacklisting strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5931) Collect information about number of tasks succeeded / total per time unit for a tasktracker.

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sharad Agarwal updated HADOOP-5931:
-----------------------------------

    Attachment: 5931_v1.patch

This patch adds a MovingWindowContext class which captures the metrics in a moving time window. The window and bucket sizes can be configured using hadoop-metrics.properties
It is a very early patch. Testing is in progress. Not all fields captured.

> Collect information about number of tasks succeeded / total per time unit for a tasktracker. 
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5931
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5931
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Hemanth Yamijala
>            Assignee: Sharad Agarwal
>         Attachments: 5931_v1.patch
>
>
> Collecting information of number of tasks succeeded / total per tasktracker and being able to see these counts per hour, day and since start time will help reason about things like the blacklisting strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5931) Collect information about number of tasks succeeded / total per time unit for a tasktracker.

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sharad Agarwal updated HADOOP-5931:
-----------------------------------

    Attachment: 5931_v3.patch

Attached patch collects the metrics in jobtracker. It doesn't use metric api. It defines a new class StatisticsCollector which keep statistics in time windows.
Stats are collected LAST_HOUR, LAST_DAY and SINCE_START. The stats are shown in jobtracker web ui on trackers list page.

> Collect information about number of tasks succeeded / total per time unit for a tasktracker. 
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5931
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5931
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Hemanth Yamijala
>            Assignee: Sharad Agarwal
>             Fix For: 0.21.0
>
>         Attachments: 5931_v1.patch, 5931_v2.patch, 5931_v3.patch
>
>
> Collecting information of number of tasks succeeded / total per tasktracker and being able to see these counts per hour, day and since start time will help reason about things like the blacklisting strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.