You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2014/04/23 21:02:16 UTC
[jira] [Comment Edited] (TEZ-1074) DAGAppMaster takes lots of CPU
when running a reasonably large job in the cluster
[ https://issues.apache.org/jira/browse/TEZ-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978234#comment-13978234 ]
Rajesh Balamohan edited comment on TEZ-1074 at 4/23/14 7:01 PM:
----------------------------------------------------------------
- Send only updated counters
- No changes to HB interval (defaults to 100 ms). However, counters are sent in HB once per second.
- DAGAM CPU consumption with this patch is around 10-15%.
I tried ProtoBuf implementation as well; but did not include it in the patch as lots of CPU cycles got wasted in "TaskStatusUpdateEventProto.parseFrom(eventBytes)".
was (Author: rajesh.balamohan):
- Send only updated counters
- No changes to HB interval (defaults to 100 ms). However, counters are sent in HB once per second.
- DAGAM CPU consumption with this patch is around 10-15%.
I tried ProtoBuf implementation as well; but did not included it in the patch as lots of CPU cycles got wasted in "TaskStatusUpdateEventProto.parseFrom(eventBytes)".
> DAGAppMaster takes lots of CPU when running a reasonably large job in the cluster
> ---------------------------------------------------------------------------------
>
> Key: TEZ-1074
> URL: https://issues.apache.org/jira/browse/TEZ-1074
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Attachments: Screen Shot 2014-04-19 at 7.26.36 PM.png, TEZ-1074-v1.patch, TEZ-1074-v2.patch, TEZ-1074-v7.patch
>
>
> - Ran a job which used 200 containers.
> - DAGAppMaster was running at 70% CPU most of the time during the job.
> - Profiling revealed that lots of time was spent on TezEvent.readFields --> ... --> TaskStatusUpdateEvent.readFields().
> - Default "tez.task.am.heartbeat.interval-ms.max=100" ms. With 200 containers, potentially 2000 events (these events have TezCounters) per second would be processed by DAGAppMaster.
> With large job, cpu usage of DAGAppMaster can bloat up significantly.
> One option to reduce CPU usage could be to send modified TezCounters in TezStatusUpdateEvent.
--
This message was sent by Atlassian JIRA
(v6.2#6252)