You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Krishna Kishore Bonagiri <wr...@gmail.com> on 2014/08/10 17:29:16 UTC
100% CPU consumption by Resource Manager process
Hi,
My YARN resource manager is consuming 100% CPU when I am running an
application that is running for about 10 hours, requesting as many as 27000
containers. The CPU consumption was very low at the starting of my
application, and it gradually went high to over 100%. Is this a known issue
or are we doing something wrong?
Every dump of the EVent Processor thread is running
LeafQueue::assignContainers() specifically the for loop below from
LeafQueue.java and seems to be looping through some priority list.
// Try to assign containers to applications in order
for (FiCaSchedulerApp application : activeApplications) {
...
// Schedule in priority order
for (Priority priority : application.getPriorities()) {
3XMTHREADINFO "ResourceManager Event Processor"
J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
native policy:UNKNOWN)
3XMTHREADINFO2 (native stack address range
from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
3XMCPUTIME *CPU usage total: 42334.614623696 secs*
3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456
(0x4FE8)
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled
Code))
5XESTACKTRACE (entered lock:
org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
entry count: 1)
5XESTACKTRACE (entered lock:
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
entry count: 1)
4XESTACKTRACE at
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
Code))
5XESTACKTRACE (entered lock:
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
entry count: 2)
4XESTACKTRACE at
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
Code))
5XESTACKTRACE (entered lock:
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
entry count: 1)
4XESTACKTRACE at
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
Code))
5XESTACKTRACE (entered lock:
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
entry count: 1)
4XESTACKTRACE at
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
Code))
4XESTACKTRACE at
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
Code))
4XESTACKTRACE at
org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
3XMTHREADINFO "ResourceManager Event Processor"
J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
native policy:UNKNOWN)
3XMTHREADINFO2 (native stack address range
from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
3XMCPUTIME CPU usage total: 42379.604203548 secs
3XMHEAPALLOC Heap bytes allocated since last GC cycle=57280
(0xDFC0)
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:841(Compiled
Code))
5XESTACKTRACE (entered lock:
org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
entry count: 1)
5XESTACKTRACE (entered lock:
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
entry count: 1)
4XESTACKTRACE at
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
Code))
5XESTACKTRACE (entered lock:
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
entry count: 2)
4XESTACKTRACE at
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
Code))
5XESTACKTRACE (entered lock:
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
entry count: 1)
4XESTACKTRACE at
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
Code))
5XESTACKTRACE (entered lock:
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
entry count: 1)
4XESTACKTRACE at
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
Code))
4XESTACKTRACE at
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
Code))
4XESTACKTRACE at
org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
3XMTHREADINFO "ResourceManager Event Processor"
J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
native policy:UNKNOWN)
3XMTHREADINFO2 (native stack address range
from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
3XMCPUTIME CPU usage total: 42996.394528764 secs
3XMHEAPALLOC Heap bytes allocated since last GC cycle=475576
(0x741B8)
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at
java/util/TreeMap.successor(TreeMap.java:2001(Compiled Code))
4XESTACKTRACE at
java/util/TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1127(Compiled
Code))
4XESTACKTRACE at
java/util/TreeMap$KeyIterator.next(TreeMap.java:1180(Compiled Code))
4XESTACKTRACE at
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:838(Compiled
Code))
5XESTACKTRACE (entered lock:
org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
entry count: 1)
5XESTACKTRACE (entered lock:
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
entry count: 1)
4XESTACKTRACE at
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
Code))
5XESTACKTRACE (entered lock:
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
entry count: 2)
4XESTACKTRACE at
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
Code))
5XESTACKTRACE (entered lock:
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
entry count: 1)
4XESTACKTRACE at
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
Code))
5XESTACKTRACE (entered lock:
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
entry count: 1)
4XESTACKTRACE at
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
Code))
4XESTACKTRACE at
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
Code))
4XESTACKTRACE at
org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
Thanks,
Kishore
Re: 100% CPU consumption by Resource Manager process
Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Thanks Wangda, I think I have reduced this when I was trying to reduce the
container allocation time.
-Kishore
On Tue, Aug 19, 2014 at 7:39 AM, Wangda Tan <wh...@gmail.com> wrote:
> Hi Krishna,
>
> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
> your configuration?
> 50
>
> I think this config is problematic, too small heartbeat-interval will
> cause NM contact RM too often. I would suggest you can set this value
> larger like 1000.
>
> Thanks,
> Wangda
>
>
>
> On Wed, Aug 13, 2014 at 4:42 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Wangda,
>> Thanks for the reply, here are the details, please see if you could
>> suggest anything.
>>
>> 1) Number of nodes and running app in the cluster
>> 2 nodes, and I am running my own application that keeps asking for
>> containers,
>> a) running something on the containers,
>> b) releasing the containers,
>> c) ask for more containers with incremented priority value, and repeat
>> the same process
>>
>> 2) What's the version of your Hadoop?
>> apache hadoop-2.4.0
>>
>> 3) Have you set
>> "yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
>> No
>>
>> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms"
>> in your configuration?
>> 50
>>
>>
>>
>>
>> On Tue, Aug 12, 2014 at 12:44 PM, Wangda Tan <wh...@gmail.com> wrote:
>>
>>> Hi Krishna,
>>> To get more understanding about the problem, could you please share
>>> following information:
>>> 1) Number of nodes and running app in the cluster
>>> 2) What's the version of your Hadoop?
>>> 3) Have you set
>>> "yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
>>> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms"
>>> in your configuration?
>>>
>>> Thanks,
>>> Wangda Tan
>>>
>>>
>>>
>>> On Sun, Aug 10, 2014 at 11:29 PM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi,
>>>> My YARN resource manager is consuming 100% CPU when I am running an
>>>> application that is running for about 10 hours, requesting as many as 27000
>>>> containers. The CPU consumption was very low at the starting of my
>>>> application, and it gradually went high to over 100%. Is this a known issue
>>>> or are we doing something wrong?
>>>>
>>>> Every dump of the EVent Processor thread is running
>>>> LeafQueue::assignContainers() specifically the for loop below from
>>>> LeafQueue.java and seems to be looping through some priority list.
>>>>
>>>> // Try to assign containers to applications in order
>>>> for (FiCaSchedulerApp application : activeApplications) {
>>>> ...
>>>> // Schedule in priority order
>>>> for (Priority priority : application.getPriorities()) {
>>>>
>>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native
>>>> priority:0x5, native policy:UNKNOWN)
>>>> 3XMTHREADINFO2 (native stack address range
>>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>>> 3XMCPUTIME *CPU usage total: 42334.614623696 secs*
>>>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456
>>>> (0x4FE8)
>>>> 3XMTHREADINFO3 Java callstack:
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>>> entry count: 1)
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 2)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>>
>>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native
>>>> priority:0x5, native policy:UNKNOWN)
>>>> 3XMTHREADINFO2 (native stack address range
>>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>>> 3XMCPUTIME CPU usage total: 42379.604203548 secs
>>>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=57280
>>>> (0xDFC0)
>>>> 3XMTHREADINFO3 Java callstack:
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:841(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>>> entry count: 1)
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 2)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>>
>>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native
>>>> priority:0x5, native policy:UNKNOWN)
>>>> 3XMTHREADINFO2 (native stack address range
>>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>>> 3XMCPUTIME CPU usage total: 42996.394528764 secs
>>>> 3XMHEAPALLOC Heap bytes allocated since last GC
>>>> cycle=475576 (0x741B8)
>>>> 3XMTHREADINFO3 Java callstack:
>>>> 4XESTACKTRACE at
>>>> java/util/TreeMap.successor(TreeMap.java:2001(Compiled Code))
>>>> 4XESTACKTRACE at
>>>> java/util/TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1127(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> java/util/TreeMap$KeyIterator.next(TreeMap.java:1180(Compiled Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:838(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>>> entry count: 1)
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 2)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>
>>>
>>
>
Re: 100% CPU consumption by Resource Manager process
Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Thanks Wangda, I think I have reduced this when I was trying to reduce the
container allocation time.
-Kishore
On Tue, Aug 19, 2014 at 7:39 AM, Wangda Tan <wh...@gmail.com> wrote:
> Hi Krishna,
>
> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
> your configuration?
> 50
>
> I think this config is problematic, too small heartbeat-interval will
> cause NM contact RM too often. I would suggest you can set this value
> larger like 1000.
>
> Thanks,
> Wangda
>
>
>
> On Wed, Aug 13, 2014 at 4:42 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Wangda,
>> Thanks for the reply, here are the details, please see if you could
>> suggest anything.
>>
>> 1) Number of nodes and running app in the cluster
>> 2 nodes, and I am running my own application that keeps asking for
>> containers,
>> a) running something on the containers,
>> b) releasing the containers,
>> c) ask for more containers with incremented priority value, and repeat
>> the same process
>>
>> 2) What's the version of your Hadoop?
>> apache hadoop-2.4.0
>>
>> 3) Have you set
>> "yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
>> No
>>
>> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms"
>> in your configuration?
>> 50
>>
>>
>>
>>
>> On Tue, Aug 12, 2014 at 12:44 PM, Wangda Tan <wh...@gmail.com> wrote:
>>
>>> Hi Krishna,
>>> To get more understanding about the problem, could you please share
>>> following information:
>>> 1) Number of nodes and running app in the cluster
>>> 2) What's the version of your Hadoop?
>>> 3) Have you set
>>> "yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
>>> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms"
>>> in your configuration?
>>>
>>> Thanks,
>>> Wangda Tan
>>>
>>>
>>>
>>> On Sun, Aug 10, 2014 at 11:29 PM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi,
>>>> My YARN resource manager is consuming 100% CPU when I am running an
>>>> application that is running for about 10 hours, requesting as many as 27000
>>>> containers. The CPU consumption was very low at the starting of my
>>>> application, and it gradually went high to over 100%. Is this a known issue
>>>> or are we doing something wrong?
>>>>
>>>> Every dump of the EVent Processor thread is running
>>>> LeafQueue::assignContainers() specifically the for loop below from
>>>> LeafQueue.java and seems to be looping through some priority list.
>>>>
>>>> // Try to assign containers to applications in order
>>>> for (FiCaSchedulerApp application : activeApplications) {
>>>> ...
>>>> // Schedule in priority order
>>>> for (Priority priority : application.getPriorities()) {
>>>>
>>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native
>>>> priority:0x5, native policy:UNKNOWN)
>>>> 3XMTHREADINFO2 (native stack address range
>>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>>> 3XMCPUTIME *CPU usage total: 42334.614623696 secs*
>>>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456
>>>> (0x4FE8)
>>>> 3XMTHREADINFO3 Java callstack:
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>>> entry count: 1)
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 2)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>>
>>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native
>>>> priority:0x5, native policy:UNKNOWN)
>>>> 3XMTHREADINFO2 (native stack address range
>>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>>> 3XMCPUTIME CPU usage total: 42379.604203548 secs
>>>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=57280
>>>> (0xDFC0)
>>>> 3XMTHREADINFO3 Java callstack:
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:841(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>>> entry count: 1)
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 2)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>>
>>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native
>>>> priority:0x5, native policy:UNKNOWN)
>>>> 3XMTHREADINFO2 (native stack address range
>>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>>> 3XMCPUTIME CPU usage total: 42996.394528764 secs
>>>> 3XMHEAPALLOC Heap bytes allocated since last GC
>>>> cycle=475576 (0x741B8)
>>>> 3XMTHREADINFO3 Java callstack:
>>>> 4XESTACKTRACE at
>>>> java/util/TreeMap.successor(TreeMap.java:2001(Compiled Code))
>>>> 4XESTACKTRACE at
>>>> java/util/TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1127(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> java/util/TreeMap$KeyIterator.next(TreeMap.java:1180(Compiled Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:838(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>>> entry count: 1)
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 2)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>
>>>
>>
>
Re: 100% CPU consumption by Resource Manager process
Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Thanks Wangda, I think I have reduced this when I was trying to reduce the
container allocation time.
-Kishore
On Tue, Aug 19, 2014 at 7:39 AM, Wangda Tan <wh...@gmail.com> wrote:
> Hi Krishna,
>
> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
> your configuration?
> 50
>
> I think this config is problematic, too small heartbeat-interval will
> cause NM contact RM too often. I would suggest you can set this value
> larger like 1000.
>
> Thanks,
> Wangda
>
>
>
> On Wed, Aug 13, 2014 at 4:42 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Wangda,
>> Thanks for the reply, here are the details, please see if you could
>> suggest anything.
>>
>> 1) Number of nodes and running app in the cluster
>> 2 nodes, and I am running my own application that keeps asking for
>> containers,
>> a) running something on the containers,
>> b) releasing the containers,
>> c) ask for more containers with incremented priority value, and repeat
>> the same process
>>
>> 2) What's the version of your Hadoop?
>> apache hadoop-2.4.0
>>
>> 3) Have you set
>> "yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
>> No
>>
>> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms"
>> in your configuration?
>> 50
>>
>>
>>
>>
>> On Tue, Aug 12, 2014 at 12:44 PM, Wangda Tan <wh...@gmail.com> wrote:
>>
>>> Hi Krishna,
>>> To get more understanding about the problem, could you please share
>>> following information:
>>> 1) Number of nodes and running app in the cluster
>>> 2) What's the version of your Hadoop?
>>> 3) Have you set
>>> "yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
>>> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms"
>>> in your configuration?
>>>
>>> Thanks,
>>> Wangda Tan
>>>
>>>
>>>
>>> On Sun, Aug 10, 2014 at 11:29 PM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi,
>>>> My YARN resource manager is consuming 100% CPU when I am running an
>>>> application that is running for about 10 hours, requesting as many as 27000
>>>> containers. The CPU consumption was very low at the starting of my
>>>> application, and it gradually went high to over 100%. Is this a known issue
>>>> or are we doing something wrong?
>>>>
>>>> Every dump of the EVent Processor thread is running
>>>> LeafQueue::assignContainers() specifically the for loop below from
>>>> LeafQueue.java and seems to be looping through some priority list.
>>>>
>>>> // Try to assign containers to applications in order
>>>> for (FiCaSchedulerApp application : activeApplications) {
>>>> ...
>>>> // Schedule in priority order
>>>> for (Priority priority : application.getPriorities()) {
>>>>
>>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native
>>>> priority:0x5, native policy:UNKNOWN)
>>>> 3XMTHREADINFO2 (native stack address range
>>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>>> 3XMCPUTIME *CPU usage total: 42334.614623696 secs*
>>>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456
>>>> (0x4FE8)
>>>> 3XMTHREADINFO3 Java callstack:
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>>> entry count: 1)
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 2)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>>
>>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native
>>>> priority:0x5, native policy:UNKNOWN)
>>>> 3XMTHREADINFO2 (native stack address range
>>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>>> 3XMCPUTIME CPU usage total: 42379.604203548 secs
>>>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=57280
>>>> (0xDFC0)
>>>> 3XMTHREADINFO3 Java callstack:
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:841(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>>> entry count: 1)
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 2)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>>
>>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native
>>>> priority:0x5, native policy:UNKNOWN)
>>>> 3XMTHREADINFO2 (native stack address range
>>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>>> 3XMCPUTIME CPU usage total: 42996.394528764 secs
>>>> 3XMHEAPALLOC Heap bytes allocated since last GC
>>>> cycle=475576 (0x741B8)
>>>> 3XMTHREADINFO3 Java callstack:
>>>> 4XESTACKTRACE at
>>>> java/util/TreeMap.successor(TreeMap.java:2001(Compiled Code))
>>>> 4XESTACKTRACE at
>>>> java/util/TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1127(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> java/util/TreeMap$KeyIterator.next(TreeMap.java:1180(Compiled Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:838(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>>> entry count: 1)
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 2)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>
>>>
>>
>
Re: 100% CPU consumption by Resource Manager process
Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Thanks Wangda, I think I have reduced this when I was trying to reduce the
container allocation time.
-Kishore
On Tue, Aug 19, 2014 at 7:39 AM, Wangda Tan <wh...@gmail.com> wrote:
> Hi Krishna,
>
> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
> your configuration?
> 50
>
> I think this config is problematic, too small heartbeat-interval will
> cause NM contact RM too often. I would suggest you can set this value
> larger like 1000.
>
> Thanks,
> Wangda
>
>
>
> On Wed, Aug 13, 2014 at 4:42 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Wangda,
>> Thanks for the reply, here are the details, please see if you could
>> suggest anything.
>>
>> 1) Number of nodes and running app in the cluster
>> 2 nodes, and I am running my own application that keeps asking for
>> containers,
>> a) running something on the containers,
>> b) releasing the containers,
>> c) ask for more containers with incremented priority value, and repeat
>> the same process
>>
>> 2) What's the version of your Hadoop?
>> apache hadoop-2.4.0
>>
>> 3) Have you set
>> "yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
>> No
>>
>> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms"
>> in your configuration?
>> 50
>>
>>
>>
>>
>> On Tue, Aug 12, 2014 at 12:44 PM, Wangda Tan <wh...@gmail.com> wrote:
>>
>>> Hi Krishna,
>>> To get more understanding about the problem, could you please share
>>> following information:
>>> 1) Number of nodes and running app in the cluster
>>> 2) What's the version of your Hadoop?
>>> 3) Have you set
>>> "yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
>>> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms"
>>> in your configuration?
>>>
>>> Thanks,
>>> Wangda Tan
>>>
>>>
>>>
>>> On Sun, Aug 10, 2014 at 11:29 PM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi,
>>>> My YARN resource manager is consuming 100% CPU when I am running an
>>>> application that is running for about 10 hours, requesting as many as 27000
>>>> containers. The CPU consumption was very low at the starting of my
>>>> application, and it gradually went high to over 100%. Is this a known issue
>>>> or are we doing something wrong?
>>>>
>>>> Every dump of the EVent Processor thread is running
>>>> LeafQueue::assignContainers() specifically the for loop below from
>>>> LeafQueue.java and seems to be looping through some priority list.
>>>>
>>>> // Try to assign containers to applications in order
>>>> for (FiCaSchedulerApp application : activeApplications) {
>>>> ...
>>>> // Schedule in priority order
>>>> for (Priority priority : application.getPriorities()) {
>>>>
>>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native
>>>> priority:0x5, native policy:UNKNOWN)
>>>> 3XMTHREADINFO2 (native stack address range
>>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>>> 3XMCPUTIME *CPU usage total: 42334.614623696 secs*
>>>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456
>>>> (0x4FE8)
>>>> 3XMTHREADINFO3 Java callstack:
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>>> entry count: 1)
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 2)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>>
>>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native
>>>> priority:0x5, native policy:UNKNOWN)
>>>> 3XMTHREADINFO2 (native stack address range
>>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>>> 3XMCPUTIME CPU usage total: 42379.604203548 secs
>>>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=57280
>>>> (0xDFC0)
>>>> 3XMTHREADINFO3 Java callstack:
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:841(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>>> entry count: 1)
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 2)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>>
>>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native
>>>> priority:0x5, native policy:UNKNOWN)
>>>> 3XMTHREADINFO2 (native stack address range
>>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>>> 3XMCPUTIME CPU usage total: 42996.394528764 secs
>>>> 3XMHEAPALLOC Heap bytes allocated since last GC
>>>> cycle=475576 (0x741B8)
>>>> 3XMTHREADINFO3 Java callstack:
>>>> 4XESTACKTRACE at
>>>> java/util/TreeMap.successor(TreeMap.java:2001(Compiled Code))
>>>> 4XESTACKTRACE at
>>>> java/util/TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1127(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> java/util/TreeMap$KeyIterator.next(TreeMap.java:1180(Compiled Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:838(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>>> entry count: 1)
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 2)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>>> Code))
>>>> 5XESTACKTRACE (entered lock:
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>>> entry count: 1)
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>>> Code))
>>>> 4XESTACKTRACE at
>>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>
>>>
>>
>
Re: 100% CPU consumption by Resource Manager process
Posted by Wangda Tan <wh...@gmail.com>.
Hi Krishna,
4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
your configuration?
50
I think this config is problematic, too small heartbeat-interval will cause
NM contact RM too often. I would suggest you can set this value larger like
1000.
Thanks,
Wangda
On Wed, Aug 13, 2014 at 4:42 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:
> Hi Wangda,
> Thanks for the reply, here are the details, please see if you could
> suggest anything.
>
> 1) Number of nodes and running app in the cluster
> 2 nodes, and I am running my own application that keeps asking for
> containers,
> a) running something on the containers,
> b) releasing the containers,
> c) ask for more containers with incremented priority value, and repeat the
> same process
>
> 2) What's the version of your Hadoop?
> apache hadoop-2.4.0
>
> 3) Have you set
> "yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
> No
>
> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
> your configuration?
> 50
>
>
>
>
> On Tue, Aug 12, 2014 at 12:44 PM, Wangda Tan <wh...@gmail.com> wrote:
>
>> Hi Krishna,
>> To get more understanding about the problem, could you please share
>> following information:
>> 1) Number of nodes and running app in the cluster
>> 2) What's the version of your Hadoop?
>> 3) Have you set
>> "yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
>> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms"
>> in your configuration?
>>
>> Thanks,
>> Wangda Tan
>>
>>
>>
>> On Sun, Aug 10, 2014 at 11:29 PM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>> Hi,
>>> My YARN resource manager is consuming 100% CPU when I am running an
>>> application that is running for about 10 hours, requesting as many as 27000
>>> containers. The CPU consumption was very low at the starting of my
>>> application, and it gradually went high to over 100%. Is this a known issue
>>> or are we doing something wrong?
>>>
>>> Every dump of the EVent Processor thread is running
>>> LeafQueue::assignContainers() specifically the for loop below from
>>> LeafQueue.java and seems to be looping through some priority list.
>>>
>>> // Try to assign containers to applications in order
>>> for (FiCaSchedulerApp application : activeApplications) {
>>> ...
>>> // Schedule in priority order
>>> for (Priority priority : application.getPriorities()) {
>>>
>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>>> native policy:UNKNOWN)
>>> 3XMTHREADINFO2 (native stack address range
>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>> 3XMCPUTIME *CPU usage total: 42334.614623696 secs*
>>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456
>>> (0x4FE8)
>>> 3XMTHREADINFO3 Java callstack:
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>> entry count: 1)
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 2)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>
>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>>> native policy:UNKNOWN)
>>> 3XMTHREADINFO2 (native stack address range
>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>> 3XMCPUTIME CPU usage total: 42379.604203548 secs
>>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=57280
>>> (0xDFC0)
>>> 3XMTHREADINFO3 Java callstack:
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:841(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>> entry count: 1)
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 2)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>
>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>>> native policy:UNKNOWN)
>>> 3XMTHREADINFO2 (native stack address range
>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>> 3XMCPUTIME CPU usage total: 42996.394528764 secs
>>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=475576
>>> (0x741B8)
>>> 3XMTHREADINFO3 Java callstack:
>>> 4XESTACKTRACE at
>>> java/util/TreeMap.successor(TreeMap.java:2001(Compiled Code))
>>> 4XESTACKTRACE at
>>> java/util/TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1127(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> java/util/TreeMap$KeyIterator.next(TreeMap.java:1180(Compiled Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:838(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>> entry count: 1)
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 2)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>
>>> Thanks,
>>> Kishore
>>>
>>
>>
>
Re: 100% CPU consumption by Resource Manager process
Posted by Wangda Tan <wh...@gmail.com>.
Hi Krishna,
4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
your configuration?
50
I think this config is problematic, too small heartbeat-interval will cause
NM contact RM too often. I would suggest you can set this value larger like
1000.
Thanks,
Wangda
On Wed, Aug 13, 2014 at 4:42 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:
> Hi Wangda,
> Thanks for the reply, here are the details, please see if you could
> suggest anything.
>
> 1) Number of nodes and running app in the cluster
> 2 nodes, and I am running my own application that keeps asking for
> containers,
> a) running something on the containers,
> b) releasing the containers,
> c) ask for more containers with incremented priority value, and repeat the
> same process
>
> 2) What's the version of your Hadoop?
> apache hadoop-2.4.0
>
> 3) Have you set
> "yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
> No
>
> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
> your configuration?
> 50
>
>
>
>
> On Tue, Aug 12, 2014 at 12:44 PM, Wangda Tan <wh...@gmail.com> wrote:
>
>> Hi Krishna,
>> To get more understanding about the problem, could you please share
>> following information:
>> 1) Number of nodes and running app in the cluster
>> 2) What's the version of your Hadoop?
>> 3) Have you set
>> "yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
>> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms"
>> in your configuration?
>>
>> Thanks,
>> Wangda Tan
>>
>>
>>
>> On Sun, Aug 10, 2014 at 11:29 PM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>> Hi,
>>> My YARN resource manager is consuming 100% CPU when I am running an
>>> application that is running for about 10 hours, requesting as many as 27000
>>> containers. The CPU consumption was very low at the starting of my
>>> application, and it gradually went high to over 100%. Is this a known issue
>>> or are we doing something wrong?
>>>
>>> Every dump of the EVent Processor thread is running
>>> LeafQueue::assignContainers() specifically the for loop below from
>>> LeafQueue.java and seems to be looping through some priority list.
>>>
>>> // Try to assign containers to applications in order
>>> for (FiCaSchedulerApp application : activeApplications) {
>>> ...
>>> // Schedule in priority order
>>> for (Priority priority : application.getPriorities()) {
>>>
>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>>> native policy:UNKNOWN)
>>> 3XMTHREADINFO2 (native stack address range
>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>> 3XMCPUTIME *CPU usage total: 42334.614623696 secs*
>>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456
>>> (0x4FE8)
>>> 3XMTHREADINFO3 Java callstack:
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>> entry count: 1)
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 2)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>
>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>>> native policy:UNKNOWN)
>>> 3XMTHREADINFO2 (native stack address range
>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>> 3XMCPUTIME CPU usage total: 42379.604203548 secs
>>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=57280
>>> (0xDFC0)
>>> 3XMTHREADINFO3 Java callstack:
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:841(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>> entry count: 1)
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 2)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>
>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>>> native policy:UNKNOWN)
>>> 3XMTHREADINFO2 (native stack address range
>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>> 3XMCPUTIME CPU usage total: 42996.394528764 secs
>>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=475576
>>> (0x741B8)
>>> 3XMTHREADINFO3 Java callstack:
>>> 4XESTACKTRACE at
>>> java/util/TreeMap.successor(TreeMap.java:2001(Compiled Code))
>>> 4XESTACKTRACE at
>>> java/util/TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1127(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> java/util/TreeMap$KeyIterator.next(TreeMap.java:1180(Compiled Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:838(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>> entry count: 1)
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 2)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>
>>> Thanks,
>>> Kishore
>>>
>>
>>
>
Re: 100% CPU consumption by Resource Manager process
Posted by Wangda Tan <wh...@gmail.com>.
Hi Krishna,
4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
your configuration?
50
I think this config is problematic, too small heartbeat-interval will cause
NM contact RM too often. I would suggest you can set this value larger like
1000.
Thanks,
Wangda
On Wed, Aug 13, 2014 at 4:42 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:
> Hi Wangda,
> Thanks for the reply, here are the details, please see if you could
> suggest anything.
>
> 1) Number of nodes and running app in the cluster
> 2 nodes, and I am running my own application that keeps asking for
> containers,
> a) running something on the containers,
> b) releasing the containers,
> c) ask for more containers with incremented priority value, and repeat the
> same process
>
> 2) What's the version of your Hadoop?
> apache hadoop-2.4.0
>
> 3) Have you set
> "yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
> No
>
> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
> your configuration?
> 50
>
>
>
>
> On Tue, Aug 12, 2014 at 12:44 PM, Wangda Tan <wh...@gmail.com> wrote:
>
>> Hi Krishna,
>> To get more understanding about the problem, could you please share
>> following information:
>> 1) Number of nodes and running app in the cluster
>> 2) What's the version of your Hadoop?
>> 3) Have you set
>> "yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
>> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms"
>> in your configuration?
>>
>> Thanks,
>> Wangda Tan
>>
>>
>>
>> On Sun, Aug 10, 2014 at 11:29 PM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>> Hi,
>>> My YARN resource manager is consuming 100% CPU when I am running an
>>> application that is running for about 10 hours, requesting as many as 27000
>>> containers. The CPU consumption was very low at the starting of my
>>> application, and it gradually went high to over 100%. Is this a known issue
>>> or are we doing something wrong?
>>>
>>> Every dump of the EVent Processor thread is running
>>> LeafQueue::assignContainers() specifically the for loop below from
>>> LeafQueue.java and seems to be looping through some priority list.
>>>
>>> // Try to assign containers to applications in order
>>> for (FiCaSchedulerApp application : activeApplications) {
>>> ...
>>> // Schedule in priority order
>>> for (Priority priority : application.getPriorities()) {
>>>
>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>>> native policy:UNKNOWN)
>>> 3XMTHREADINFO2 (native stack address range
>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>> 3XMCPUTIME *CPU usage total: 42334.614623696 secs*
>>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456
>>> (0x4FE8)
>>> 3XMTHREADINFO3 Java callstack:
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>> entry count: 1)
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 2)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>
>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>>> native policy:UNKNOWN)
>>> 3XMTHREADINFO2 (native stack address range
>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>> 3XMCPUTIME CPU usage total: 42379.604203548 secs
>>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=57280
>>> (0xDFC0)
>>> 3XMTHREADINFO3 Java callstack:
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:841(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>> entry count: 1)
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 2)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>
>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>>> native policy:UNKNOWN)
>>> 3XMTHREADINFO2 (native stack address range
>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>> 3XMCPUTIME CPU usage total: 42996.394528764 secs
>>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=475576
>>> (0x741B8)
>>> 3XMTHREADINFO3 Java callstack:
>>> 4XESTACKTRACE at
>>> java/util/TreeMap.successor(TreeMap.java:2001(Compiled Code))
>>> 4XESTACKTRACE at
>>> java/util/TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1127(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> java/util/TreeMap$KeyIterator.next(TreeMap.java:1180(Compiled Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:838(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>> entry count: 1)
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 2)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>
>>> Thanks,
>>> Kishore
>>>
>>
>>
>
Re: 100% CPU consumption by Resource Manager process
Posted by Wangda Tan <wh...@gmail.com>.
Hi Krishna,
4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
your configuration?
50
I think this config is problematic, too small heartbeat-interval will cause
NM contact RM too often. I would suggest you can set this value larger like
1000.
Thanks,
Wangda
On Wed, Aug 13, 2014 at 4:42 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:
> Hi Wangda,
> Thanks for the reply, here are the details, please see if you could
> suggest anything.
>
> 1) Number of nodes and running app in the cluster
> 2 nodes, and I am running my own application that keeps asking for
> containers,
> a) running something on the containers,
> b) releasing the containers,
> c) ask for more containers with incremented priority value, and repeat the
> same process
>
> 2) What's the version of your Hadoop?
> apache hadoop-2.4.0
>
> 3) Have you set
> "yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
> No
>
> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
> your configuration?
> 50
>
>
>
>
> On Tue, Aug 12, 2014 at 12:44 PM, Wangda Tan <wh...@gmail.com> wrote:
>
>> Hi Krishna,
>> To get more understanding about the problem, could you please share
>> following information:
>> 1) Number of nodes and running app in the cluster
>> 2) What's the version of your Hadoop?
>> 3) Have you set
>> "yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
>> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms"
>> in your configuration?
>>
>> Thanks,
>> Wangda Tan
>>
>>
>>
>> On Sun, Aug 10, 2014 at 11:29 PM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>> Hi,
>>> My YARN resource manager is consuming 100% CPU when I am running an
>>> application that is running for about 10 hours, requesting as many as 27000
>>> containers. The CPU consumption was very low at the starting of my
>>> application, and it gradually went high to over 100%. Is this a known issue
>>> or are we doing something wrong?
>>>
>>> Every dump of the EVent Processor thread is running
>>> LeafQueue::assignContainers() specifically the for loop below from
>>> LeafQueue.java and seems to be looping through some priority list.
>>>
>>> // Try to assign containers to applications in order
>>> for (FiCaSchedulerApp application : activeApplications) {
>>> ...
>>> // Schedule in priority order
>>> for (Priority priority : application.getPriorities()) {
>>>
>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>>> native policy:UNKNOWN)
>>> 3XMTHREADINFO2 (native stack address range
>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>> 3XMCPUTIME *CPU usage total: 42334.614623696 secs*
>>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456
>>> (0x4FE8)
>>> 3XMTHREADINFO3 Java callstack:
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>> entry count: 1)
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 2)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>
>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>>> native policy:UNKNOWN)
>>> 3XMTHREADINFO2 (native stack address range
>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>> 3XMCPUTIME CPU usage total: 42379.604203548 secs
>>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=57280
>>> (0xDFC0)
>>> 3XMTHREADINFO3 Java callstack:
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:841(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>> entry count: 1)
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 2)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>
>>> 3XMTHREADINFO "ResourceManager Event Processor"
>>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>>> native policy:UNKNOWN)
>>> 3XMTHREADINFO2 (native stack address range
>>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>>> 3XMCPUTIME CPU usage total: 42996.394528764 secs
>>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=475576
>>> (0x741B8)
>>> 3XMTHREADINFO3 Java callstack:
>>> 4XESTACKTRACE at
>>> java/util/TreeMap.successor(TreeMap.java:2001(Compiled Code))
>>> 4XESTACKTRACE at
>>> java/util/TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1127(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> java/util/TreeMap$KeyIterator.next(TreeMap.java:1180(Compiled Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:838(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>>> entry count: 1)
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 2)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>>> Code))
>>> 5XESTACKTRACE (entered lock:
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>>> entry count: 1)
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>>> Code))
>>> 4XESTACKTRACE at
>>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>>
>>> Thanks,
>>> Kishore
>>>
>>
>>
>
Re: 100% CPU consumption by Resource Manager process
Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Wangda,
Thanks for the reply, here are the details, please see if you could
suggest anything.
1) Number of nodes and running app in the cluster
2 nodes, and I am running my own application that keeps asking for
containers,
a) running something on the containers,
b) releasing the containers,
c) ask for more containers with incremented priority value, and repeat the
same process
2) What's the version of your Hadoop?
apache hadoop-2.4.0
3) Have you set
"yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
No
4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
your configuration?
50
On Tue, Aug 12, 2014 at 12:44 PM, Wangda Tan <wh...@gmail.com> wrote:
> Hi Krishna,
> To get more understanding about the problem, could you please share
> following information:
> 1) Number of nodes and running app in the cluster
> 2) What's the version of your Hadoop?
> 3) Have you set
> "yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
> your configuration?
>
> Thanks,
> Wangda Tan
>
>
>
> On Sun, Aug 10, 2014 at 11:29 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi,
>> My YARN resource manager is consuming 100% CPU when I am running an
>> application that is running for about 10 hours, requesting as many as 27000
>> containers. The CPU consumption was very low at the starting of my
>> application, and it gradually went high to over 100%. Is this a known issue
>> or are we doing something wrong?
>>
>> Every dump of the EVent Processor thread is running
>> LeafQueue::assignContainers() specifically the for loop below from
>> LeafQueue.java and seems to be looping through some priority list.
>>
>> // Try to assign containers to applications in order
>> for (FiCaSchedulerApp application : activeApplications) {
>> ...
>> // Schedule in priority order
>> for (Priority priority : application.getPriorities()) {
>>
>> 3XMTHREADINFO "ResourceManager Event Processor"
>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>> native policy:UNKNOWN)
>> 3XMTHREADINFO2 (native stack address range
>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>> 3XMCPUTIME *CPU usage total: 42334.614623696 secs*
>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456
>> (0x4FE8)
>> 3XMTHREADINFO3 Java callstack:
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>> entry count: 1)
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 2)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>
>> 3XMTHREADINFO "ResourceManager Event Processor"
>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>> native policy:UNKNOWN)
>> 3XMTHREADINFO2 (native stack address range
>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>> 3XMCPUTIME CPU usage total: 42379.604203548 secs
>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=57280
>> (0xDFC0)
>> 3XMTHREADINFO3 Java callstack:
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:841(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>> entry count: 1)
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 2)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>
>> 3XMTHREADINFO "ResourceManager Event Processor"
>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>> native policy:UNKNOWN)
>> 3XMTHREADINFO2 (native stack address range
>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>> 3XMCPUTIME CPU usage total: 42996.394528764 secs
>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=475576
>> (0x741B8)
>> 3XMTHREADINFO3 Java callstack:
>> 4XESTACKTRACE at
>> java/util/TreeMap.successor(TreeMap.java:2001(Compiled Code))
>> 4XESTACKTRACE at
>> java/util/TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1127(Compiled
>> Code))
>> 4XESTACKTRACE at
>> java/util/TreeMap$KeyIterator.next(TreeMap.java:1180(Compiled Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:838(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>> entry count: 1)
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 2)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>
>> Thanks,
>> Kishore
>>
>
>
Re: 100% CPU consumption by Resource Manager process
Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Wangda,
Thanks for the reply, here are the details, please see if you could
suggest anything.
1) Number of nodes and running app in the cluster
2 nodes, and I am running my own application that keeps asking for
containers,
a) running something on the containers,
b) releasing the containers,
c) ask for more containers with incremented priority value, and repeat the
same process
2) What's the version of your Hadoop?
apache hadoop-2.4.0
3) Have you set
"yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
No
4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
your configuration?
50
On Tue, Aug 12, 2014 at 12:44 PM, Wangda Tan <wh...@gmail.com> wrote:
> Hi Krishna,
> To get more understanding about the problem, could you please share
> following information:
> 1) Number of nodes and running app in the cluster
> 2) What's the version of your Hadoop?
> 3) Have you set
> "yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
> your configuration?
>
> Thanks,
> Wangda Tan
>
>
>
> On Sun, Aug 10, 2014 at 11:29 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi,
>> My YARN resource manager is consuming 100% CPU when I am running an
>> application that is running for about 10 hours, requesting as many as 27000
>> containers. The CPU consumption was very low at the starting of my
>> application, and it gradually went high to over 100%. Is this a known issue
>> or are we doing something wrong?
>>
>> Every dump of the EVent Processor thread is running
>> LeafQueue::assignContainers() specifically the for loop below from
>> LeafQueue.java and seems to be looping through some priority list.
>>
>> // Try to assign containers to applications in order
>> for (FiCaSchedulerApp application : activeApplications) {
>> ...
>> // Schedule in priority order
>> for (Priority priority : application.getPriorities()) {
>>
>> 3XMTHREADINFO "ResourceManager Event Processor"
>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>> native policy:UNKNOWN)
>> 3XMTHREADINFO2 (native stack address range
>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>> 3XMCPUTIME *CPU usage total: 42334.614623696 secs*
>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456
>> (0x4FE8)
>> 3XMTHREADINFO3 Java callstack:
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>> entry count: 1)
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 2)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>
>> 3XMTHREADINFO "ResourceManager Event Processor"
>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>> native policy:UNKNOWN)
>> 3XMTHREADINFO2 (native stack address range
>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>> 3XMCPUTIME CPU usage total: 42379.604203548 secs
>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=57280
>> (0xDFC0)
>> 3XMTHREADINFO3 Java callstack:
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:841(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>> entry count: 1)
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 2)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>
>> 3XMTHREADINFO "ResourceManager Event Processor"
>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>> native policy:UNKNOWN)
>> 3XMTHREADINFO2 (native stack address range
>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>> 3XMCPUTIME CPU usage total: 42996.394528764 secs
>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=475576
>> (0x741B8)
>> 3XMTHREADINFO3 Java callstack:
>> 4XESTACKTRACE at
>> java/util/TreeMap.successor(TreeMap.java:2001(Compiled Code))
>> 4XESTACKTRACE at
>> java/util/TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1127(Compiled
>> Code))
>> 4XESTACKTRACE at
>> java/util/TreeMap$KeyIterator.next(TreeMap.java:1180(Compiled Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:838(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>> entry count: 1)
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 2)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>
>> Thanks,
>> Kishore
>>
>
>
Re: 100% CPU consumption by Resource Manager process
Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Wangda,
Thanks for the reply, here are the details, please see if you could
suggest anything.
1) Number of nodes and running app in the cluster
2 nodes, and I am running my own application that keeps asking for
containers,
a) running something on the containers,
b) releasing the containers,
c) ask for more containers with incremented priority value, and repeat the
same process
2) What's the version of your Hadoop?
apache hadoop-2.4.0
3) Have you set
"yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
No
4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
your configuration?
50
On Tue, Aug 12, 2014 at 12:44 PM, Wangda Tan <wh...@gmail.com> wrote:
> Hi Krishna,
> To get more understanding about the problem, could you please share
> following information:
> 1) Number of nodes and running app in the cluster
> 2) What's the version of your Hadoop?
> 3) Have you set
> "yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
> your configuration?
>
> Thanks,
> Wangda Tan
>
>
>
> On Sun, Aug 10, 2014 at 11:29 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi,
>> My YARN resource manager is consuming 100% CPU when I am running an
>> application that is running for about 10 hours, requesting as many as 27000
>> containers. The CPU consumption was very low at the starting of my
>> application, and it gradually went high to over 100%. Is this a known issue
>> or are we doing something wrong?
>>
>> Every dump of the EVent Processor thread is running
>> LeafQueue::assignContainers() specifically the for loop below from
>> LeafQueue.java and seems to be looping through some priority list.
>>
>> // Try to assign containers to applications in order
>> for (FiCaSchedulerApp application : activeApplications) {
>> ...
>> // Schedule in priority order
>> for (Priority priority : application.getPriorities()) {
>>
>> 3XMTHREADINFO "ResourceManager Event Processor"
>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>> native policy:UNKNOWN)
>> 3XMTHREADINFO2 (native stack address range
>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>> 3XMCPUTIME *CPU usage total: 42334.614623696 secs*
>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456
>> (0x4FE8)
>> 3XMTHREADINFO3 Java callstack:
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>> entry count: 1)
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 2)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>
>> 3XMTHREADINFO "ResourceManager Event Processor"
>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>> native policy:UNKNOWN)
>> 3XMTHREADINFO2 (native stack address range
>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>> 3XMCPUTIME CPU usage total: 42379.604203548 secs
>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=57280
>> (0xDFC0)
>> 3XMTHREADINFO3 Java callstack:
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:841(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>> entry count: 1)
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 2)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>
>> 3XMTHREADINFO "ResourceManager Event Processor"
>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>> native policy:UNKNOWN)
>> 3XMTHREADINFO2 (native stack address range
>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>> 3XMCPUTIME CPU usage total: 42996.394528764 secs
>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=475576
>> (0x741B8)
>> 3XMTHREADINFO3 Java callstack:
>> 4XESTACKTRACE at
>> java/util/TreeMap.successor(TreeMap.java:2001(Compiled Code))
>> 4XESTACKTRACE at
>> java/util/TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1127(Compiled
>> Code))
>> 4XESTACKTRACE at
>> java/util/TreeMap$KeyIterator.next(TreeMap.java:1180(Compiled Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:838(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>> entry count: 1)
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 2)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>
>> Thanks,
>> Kishore
>>
>
>
Re: 100% CPU consumption by Resource Manager process
Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Wangda,
Thanks for the reply, here are the details, please see if you could
suggest anything.
1) Number of nodes and running app in the cluster
2 nodes, and I am running my own application that keeps asking for
containers,
a) running something on the containers,
b) releasing the containers,
c) ask for more containers with incremented priority value, and repeat the
same process
2) What's the version of your Hadoop?
apache hadoop-2.4.0
3) Have you set
"yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
No
4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
your configuration?
50
On Tue, Aug 12, 2014 at 12:44 PM, Wangda Tan <wh...@gmail.com> wrote:
> Hi Krishna,
> To get more understanding about the problem, could you please share
> following information:
> 1) Number of nodes and running app in the cluster
> 2) What's the version of your Hadoop?
> 3) Have you set
> "yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
> 4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
> your configuration?
>
> Thanks,
> Wangda Tan
>
>
>
> On Sun, Aug 10, 2014 at 11:29 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi,
>> My YARN resource manager is consuming 100% CPU when I am running an
>> application that is running for about 10 hours, requesting as many as 27000
>> containers. The CPU consumption was very low at the starting of my
>> application, and it gradually went high to over 100%. Is this a known issue
>> or are we doing something wrong?
>>
>> Every dump of the EVent Processor thread is running
>> LeafQueue::assignContainers() specifically the for loop below from
>> LeafQueue.java and seems to be looping through some priority list.
>>
>> // Try to assign containers to applications in order
>> for (FiCaSchedulerApp application : activeApplications) {
>> ...
>> // Schedule in priority order
>> for (Priority priority : application.getPriorities()) {
>>
>> 3XMTHREADINFO "ResourceManager Event Processor"
>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>> native policy:UNKNOWN)
>> 3XMTHREADINFO2 (native stack address range
>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>> 3XMCPUTIME *CPU usage total: 42334.614623696 secs*
>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456
>> (0x4FE8)
>> 3XMTHREADINFO3 Java callstack:
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>> entry count: 1)
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 2)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>
>> 3XMTHREADINFO "ResourceManager Event Processor"
>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>> native policy:UNKNOWN)
>> 3XMTHREADINFO2 (native stack address range
>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>> 3XMCPUTIME CPU usage total: 42379.604203548 secs
>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=57280
>> (0xDFC0)
>> 3XMTHREADINFO3 Java callstack:
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:841(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>> entry count: 1)
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 2)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>
>> 3XMTHREADINFO "ResourceManager Event Processor"
>> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
>> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
>> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
>> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
>> native policy:UNKNOWN)
>> 3XMTHREADINFO2 (native stack address range
>> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
>> 3XMCPUTIME CPU usage total: 42996.394528764 secs
>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=475576
>> (0x741B8)
>> 3XMTHREADINFO3 Java callstack:
>> 4XESTACKTRACE at
>> java/util/TreeMap.successor(TreeMap.java:2001(Compiled Code))
>> 4XESTACKTRACE at
>> java/util/TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1127(Compiled
>> Code))
>> 4XESTACKTRACE at
>> java/util/TreeMap$KeyIterator.next(TreeMap.java:1180(Compiled Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:838(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
>> entry count: 1)
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 2)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
>> Code))
>> 5XESTACKTRACE (entered lock:
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
>> entry count: 1)
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
>> Code))
>> 4XESTACKTRACE at
>> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>>
>> Thanks,
>> Kishore
>>
>
>
Re: 100% CPU consumption by Resource Manager process
Posted by Wangda Tan <wh...@gmail.com>.
Hi Krishna,
To get more understanding about the problem, could you please share
following information:
1) Number of nodes and running app in the cluster
2) What's the version of your Hadoop?
3) Have you set
"yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
your configuration?
Thanks,
Wangda Tan
On Sun, Aug 10, 2014 at 11:29 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:
> Hi,
> My YARN resource manager is consuming 100% CPU when I am running an
> application that is running for about 10 hours, requesting as many as 27000
> containers. The CPU consumption was very low at the starting of my
> application, and it gradually went high to over 100%. Is this a known issue
> or are we doing something wrong?
>
> Every dump of the EVent Processor thread is running
> LeafQueue::assignContainers() specifically the for loop below from
> LeafQueue.java and seems to be looping through some priority list.
>
> // Try to assign containers to applications in order
> for (FiCaSchedulerApp application : activeApplications) {
> ...
> // Schedule in priority order
> for (Priority priority : application.getPriorities()) {
>
> 3XMTHREADINFO "ResourceManager Event Processor"
> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
> native policy:UNKNOWN)
> 3XMTHREADINFO2 (native stack address range
> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
> 3XMCPUTIME *CPU usage total: 42334.614623696 secs*
> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456
> (0x4FE8)
> 3XMTHREADINFO3 Java callstack:
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
> entry count: 1)
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 2)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>
> 3XMTHREADINFO "ResourceManager Event Processor"
> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
> native policy:UNKNOWN)
> 3XMTHREADINFO2 (native stack address range
> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
> 3XMCPUTIME CPU usage total: 42379.604203548 secs
> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=57280
> (0xDFC0)
> 3XMTHREADINFO3 Java callstack:
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:841(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
> entry count: 1)
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 2)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>
> 3XMTHREADINFO "ResourceManager Event Processor"
> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
> native policy:UNKNOWN)
> 3XMTHREADINFO2 (native stack address range
> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
> 3XMCPUTIME CPU usage total: 42996.394528764 secs
> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=475576
> (0x741B8)
> 3XMTHREADINFO3 Java callstack:
> 4XESTACKTRACE at
> java/util/TreeMap.successor(TreeMap.java:2001(Compiled Code))
> 4XESTACKTRACE at
> java/util/TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1127(Compiled
> Code))
> 4XESTACKTRACE at
> java/util/TreeMap$KeyIterator.next(TreeMap.java:1180(Compiled Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:838(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
> entry count: 1)
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 2)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>
> Thanks,
> Kishore
>
Re: 100% CPU consumption by Resource Manager process
Posted by Wangda Tan <wh...@gmail.com>.
Hi Krishna,
To get more understanding about the problem, could you please share
following information:
1) Number of nodes and running app in the cluster
2) What's the version of your Hadoop?
3) Have you set
"yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
your configuration?
Thanks,
Wangda Tan
On Sun, Aug 10, 2014 at 11:29 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:
> Hi,
> My YARN resource manager is consuming 100% CPU when I am running an
> application that is running for about 10 hours, requesting as many as 27000
> containers. The CPU consumption was very low at the starting of my
> application, and it gradually went high to over 100%. Is this a known issue
> or are we doing something wrong?
>
> Every dump of the EVent Processor thread is running
> LeafQueue::assignContainers() specifically the for loop below from
> LeafQueue.java and seems to be looping through some priority list.
>
> // Try to assign containers to applications in order
> for (FiCaSchedulerApp application : activeApplications) {
> ...
> // Schedule in priority order
> for (Priority priority : application.getPriorities()) {
>
> 3XMTHREADINFO "ResourceManager Event Processor"
> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
> native policy:UNKNOWN)
> 3XMTHREADINFO2 (native stack address range
> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
> 3XMCPUTIME *CPU usage total: 42334.614623696 secs*
> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456
> (0x4FE8)
> 3XMTHREADINFO3 Java callstack:
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
> entry count: 1)
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 2)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>
> 3XMTHREADINFO "ResourceManager Event Processor"
> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
> native policy:UNKNOWN)
> 3XMTHREADINFO2 (native stack address range
> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
> 3XMCPUTIME CPU usage total: 42379.604203548 secs
> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=57280
> (0xDFC0)
> 3XMTHREADINFO3 Java callstack:
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:841(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
> entry count: 1)
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 2)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>
> 3XMTHREADINFO "ResourceManager Event Processor"
> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
> native policy:UNKNOWN)
> 3XMTHREADINFO2 (native stack address range
> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
> 3XMCPUTIME CPU usage total: 42996.394528764 secs
> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=475576
> (0x741B8)
> 3XMTHREADINFO3 Java callstack:
> 4XESTACKTRACE at
> java/util/TreeMap.successor(TreeMap.java:2001(Compiled Code))
> 4XESTACKTRACE at
> java/util/TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1127(Compiled
> Code))
> 4XESTACKTRACE at
> java/util/TreeMap$KeyIterator.next(TreeMap.java:1180(Compiled Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:838(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
> entry count: 1)
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 2)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>
> Thanks,
> Kishore
>
Re: 100% CPU consumption by Resource Manager process
Posted by Wangda Tan <wh...@gmail.com>.
Hi Krishna,
To get more understanding about the problem, could you please share
following information:
1) Number of nodes and running app in the cluster
2) What's the version of your Hadoop?
3) Have you set
"yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
your configuration?
Thanks,
Wangda Tan
On Sun, Aug 10, 2014 at 11:29 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:
> Hi,
> My YARN resource manager is consuming 100% CPU when I am running an
> application that is running for about 10 hours, requesting as many as 27000
> containers. The CPU consumption was very low at the starting of my
> application, and it gradually went high to over 100%. Is this a known issue
> or are we doing something wrong?
>
> Every dump of the EVent Processor thread is running
> LeafQueue::assignContainers() specifically the for loop below from
> LeafQueue.java and seems to be looping through some priority list.
>
> // Try to assign containers to applications in order
> for (FiCaSchedulerApp application : activeApplications) {
> ...
> // Schedule in priority order
> for (Priority priority : application.getPriorities()) {
>
> 3XMTHREADINFO "ResourceManager Event Processor"
> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
> native policy:UNKNOWN)
> 3XMTHREADINFO2 (native stack address range
> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
> 3XMCPUTIME *CPU usage total: 42334.614623696 secs*
> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456
> (0x4FE8)
> 3XMTHREADINFO3 Java callstack:
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
> entry count: 1)
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 2)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>
> 3XMTHREADINFO "ResourceManager Event Processor"
> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
> native policy:UNKNOWN)
> 3XMTHREADINFO2 (native stack address range
> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
> 3XMCPUTIME CPU usage total: 42379.604203548 secs
> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=57280
> (0xDFC0)
> 3XMTHREADINFO3 Java callstack:
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:841(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
> entry count: 1)
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 2)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>
> 3XMTHREADINFO "ResourceManager Event Processor"
> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
> native policy:UNKNOWN)
> 3XMTHREADINFO2 (native stack address range
> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
> 3XMCPUTIME CPU usage total: 42996.394528764 secs
> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=475576
> (0x741B8)
> 3XMTHREADINFO3 Java callstack:
> 4XESTACKTRACE at
> java/util/TreeMap.successor(TreeMap.java:2001(Compiled Code))
> 4XESTACKTRACE at
> java/util/TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1127(Compiled
> Code))
> 4XESTACKTRACE at
> java/util/TreeMap$KeyIterator.next(TreeMap.java:1180(Compiled Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:838(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
> entry count: 1)
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 2)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>
> Thanks,
> Kishore
>
Re: 100% CPU consumption by Resource Manager process
Posted by Wangda Tan <wh...@gmail.com>.
Hi Krishna,
To get more understanding about the problem, could you please share
following information:
1) Number of nodes and running app in the cluster
2) What's the version of your Hadoop?
3) Have you set
"yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
4) What's the "yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" in
your configuration?
Thanks,
Wangda Tan
On Sun, Aug 10, 2014 at 11:29 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:
> Hi,
> My YARN resource manager is consuming 100% CPU when I am running an
> application that is running for about 10 hours, requesting as many as 27000
> containers. The CPU consumption was very low at the starting of my
> application, and it gradually went high to over 100%. Is this a known issue
> or are we doing something wrong?
>
> Every dump of the EVent Processor thread is running
> LeafQueue::assignContainers() specifically the for loop below from
> LeafQueue.java and seems to be looping through some priority list.
>
> // Try to assign containers to applications in order
> for (FiCaSchedulerApp application : activeApplications) {
> ...
> // Schedule in priority order
> for (Priority priority : application.getPriorities()) {
>
> 3XMTHREADINFO "ResourceManager Event Processor"
> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
> native policy:UNKNOWN)
> 3XMTHREADINFO2 (native stack address range
> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
> 3XMCPUTIME *CPU usage total: 42334.614623696 secs*
> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456
> (0x4FE8)
> 3XMTHREADINFO3 Java callstack:
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
> entry count: 1)
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 2)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>
> 3XMTHREADINFO "ResourceManager Event Processor"
> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
> native policy:UNKNOWN)
> 3XMTHREADINFO2 (native stack address range
> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
> 3XMCPUTIME CPU usage total: 42379.604203548 secs
> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=57280
> (0xDFC0)
> 3XMTHREADINFO3 Java callstack:
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:841(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
> entry count: 1)
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 2)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>
> 3XMTHREADINFO "ResourceManager Event Processor"
> J9VMThread:0x0000000001D08600, j9thread_t:0x00007F032D2FAA00,
> java/lang/Thread:0x000000008341D9A0, state:CW, prio=5
> 3XMJAVALTHREAD (java/lang/Thread getId:0x1E, isDaemon:false)
> 3XMTHREADINFO1 (native thread ID:0x4B64, native priority:0x5,
> native policy:UNKNOWN)
> 3XMTHREADINFO2 (native stack address range
> from:0x00007F0313DF8000, to:0x00007F0313E39000, size:0x41000)
> 3XMCPUTIME CPU usage total: 42996.394528764 secs
> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=475576
> (0x741B8)
> 3XMTHREADINFO3 Java callstack:
> 4XESTACKTRACE at
> java/util/TreeMap.successor(TreeMap.java:2001(Compiled Code))
> 4XESTACKTRACE at
> java/util/TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1127(Compiled
> Code))
> 4XESTACKTRACE at
> java/util/TreeMap$KeyIterator.next(TreeMap.java:1180(Compiled Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:838(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x000000008360DFE0,
> entry count: 1)
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x00000000833B9280,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 2)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x0000000083360A80,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
> Code))
> 5XESTACKTRACE (entered lock:
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x00000000834037C8,
> entry count: 1)
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
> Code))
> 4XESTACKTRACE at
> org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:853)
>
> Thanks,
> Kishore
>