You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Alejandro Abdelnur (JIRA)" <ji...@apache.org> on 2008/08/27 18:19:44 UTC

[jira] Commented: (HADOOP-3748) Flag to make tasks to send counter information only at the end of the task

    [ https://issues.apache.org/jira/browse/HADOOP-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626171#action_12626171 ] 

Alejandro Abdelnur commented on HADOOP-3748:
--------------------------------------------

We've done some tests having a jobs running for 30 mins utilizing all task slots in the cluster (8 task per slave). We did the runs using 0 counters and runs using 200 counters (the same jobs, just enabling the counters via a configuration property).

8 nodes cluster: no noticeable increase in avg network pgk_in traffic in the JT box when using counters.
100 nodes cluster: 28% increase in avg network pgk_in traffic in the JT box when using counters.
200 nodes cluster: 60% increase in avg network pgk_in traffic in the JT box when using counters (with 10x peeks).

As we suspected, the greater the number of nodes the higher the impact of using large number of counters will be.Thus the proposed aproach.

We want to leverage Hadoop counters as Hadoop takes care of aggregating and reporting them, plus in the case of task failures they are kept consistently. We are tracking with them what we do with records in the processing.




> Flag to make tasks to send counter information only at the end of the task
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-3748
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3748
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>
> Currently counters are streaming from the task to the jobtracker as the task progresses. If the number of counters is large this has a significant impact on the network traffic as well as in the JobTracker load.
> The should be a flag, for example by counter-group, that indicates that the counters are to be reported at the end of the task. By default this flag should be set to false for all counter-groups maintaining the current behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.