You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Bilwa S T (JIRA)" <ji...@apache.org> on 2019/08/12 05:12:00 UTC

[jira] [Created] (YARN-9738) Remove lock on ClusterNodeTracker#getNodeReport as it blocks application submission

Bilwa S T created YARN-9738:
-------------------------------

             Summary: Remove lock on ClusterNodeTracker#getNodeReport as it blocks application submission
                 Key: YARN-9738
                 URL: https://issues.apache.org/jira/browse/YARN-9738
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Bilwa S T
            Assignee: Bilwa S T


*Env : *
Server OS :- UBUNTU
No. of Cluster Node:- 9120 NMs
Env Mode:- [Secure / Non secure]Secure

*Preconditions:*
~9120 NM's was running
~1250 applications was in running state 
35K applications was in pending state

*Test Steps:*
1. Submit the application from 5 clients, each client 2 threads and total 10 queues
2. Once application submittion increases (for each application of distributted shell will call getClusterNodes)

ClientRMservice#getClusterNodes tries to get ClusterNodeTracker#getNodeReport where map nodes is locked.

{quote}
"IPC Server handler 36 on 45022" #246 daemon prio=5 os_prio=0 tid=0x00007f75095de000 nid=0x1949c waiting on condition [0x00007f74cff78000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00007f759f6d8858> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.getNodeReport(ClusterNodeTracker.java:123)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getNodeReport(AbstractYarnScheduler.java:449)
	at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.createNodeReports(ClientRMService.java:1067)
	at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getClusterNodes(ClientRMService.java:992)
	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:313)
	at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:589)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2792)
{quote}

Instead we can make nodes as concurrentHashMap and remove readlock







--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org