You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Koji Noguchi (JIRA)" <ji...@apache.org> on 2012/08/15 17:54:38 UTC
[jira] [Updated] (MAPREDUCE-1684) ClusterStatus can be cached in
CapacityTaskScheduler.assignTasks()
[ https://issues.apache.org/jira/browse/MAPREDUCE-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Koji Noguchi updated MAPREDUCE-1684:
------------------------------------
Attachment: mapreduce-1684-v1.0.2-1.patch
bq. Currently, CapacityTaskScheduler.assignTasks() calls getClusterStatus() thrice
I think it calls getClusterStatus calls #jobs times in the worst case.
For each heartbeat from TaskTracker with some slots available,
{noformat}
heartbeat --> assignTasks
--> addMap/ReduceTasks
--> TaskSchedulingMgr.assignTasks
--> For each queue : queuesForAssigningTasks)
--> getTaskFromQueue(queue)
--> For each j : queue.getRunningJobs()
--> obtainNewTask --> **getClusterStatus**
{noformat}
bq. It can be cached in assignTasks() and re-used.
Attaching a patch. Would this work?
Motivation is, we see getClusterStatus way too often in our jstack holding the global lock.
{noformat}
"IPC Server handler 15 on 50300" daemon prio=10 tid=0x000000005fc5d800 nid=0x6828 runnable [0x0000000044847000]
java.lang.Thread.State: RUNNABLE
at org.apache.hadoop.mapred.JobTracker.getClusterStatus(JobTracker.java:4065)
- locked <0x00002aab6e638bd8> (a org.apache.hadoop.mapred.JobTracker)
at org.apache.hadoop.mapred.CapacityTaskScheduler$MapSchedulingMgr.obtainNewTask(CapacityTaskScheduler.java:503)
at org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.getTaskFromQueue(CapacityTaskScheduler.java:322)
at org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.assignTasks(CapacityTaskScheduler.java:419)
at org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.access$500(CapacityTaskScheduler.java:150)
at org.apache.hadoop.mapred.CapacityTaskScheduler.addMapTasks(CapacityTaskScheduler.java:1075)
at org.apache.hadoop.mapred.CapacityTaskScheduler.assignTasks(CapacityTaskScheduler.java:1044)
- locked <0x00002aab6e7ffb10> (a org.apache.hadoop.mapred.CapacityTaskScheduler)
at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3398)
- locked <0x00002aab6e638bd8> (a org.apache.hadoop.mapred.JobTracker)
at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
{noformat}
> ClusterStatus can be cached in CapacityTaskScheduler.assignTasks()
> ------------------------------------------------------------------
>
> Key: MAPREDUCE-1684
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1684
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: capacity-sched
> Reporter: Amareshwari Sriramadasu
> Attachments: mapreduce-1684-v1.0.2-1.patch
>
>
> Currently, CapacityTaskScheduler.assignTasks() calls getClusterStatus() thrice: once in assignTasks(), once in MapTaskScheduler and once in ReduceTaskScheduler. It can be cached in assignTasks() and re-used.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira